Archives: May 2008

Bringing the Semantic Web to Education

Jennifer Zaino Contributor

Remember that commercial: “It’s 10 p.m. Do you know where your children are?”

Learn how the Semantic Web is changing the way we treat data at the LinkedData Planet Conference. Sir Tim Berners-Lee, inventor of the World Wide Web and director of the W3C, is among the event’s keynote speakers.

These days, chances are that they’re as likely to be surfing the web as anywhere else — or, more specifically, whiling away the late night hours on Facebook, MySpace, or YouTube. That’s not necessarily a bad thing, but wouldn’t it be nice if there was an alternative social networking space where they could interact with each other as well as learn in a creative environment, too?

“A parallel vision is to get kids sticky on education, instead of pure socialization in the network,” says Rebecca Dias, VP of Software Development, at SynapticMash, a vendor of adaptive learning systems based on semantic web technologies.

Dias will be speaking about the social internet and its potential promise at the upcoming LinkedData Planet conference in New York City on June 17 and 18. “We want to create a creative space for learning where kids can cross-educate themselves.”

In the summer, SynapticMash plans to release its MashQube environment, an eco-system for educators, parents, and students to collaborate on curriculum development, content, and learning.

Its infrastructure platform services, hosted off-site, utilize semantic web technologies to enable the integration of data from various educational technologies that will be required to support such a collaborative environment. That underpins the company’s LearningQube tools, as well, which provide secure, real-time access to student data; proactively informs teachers about student trends based on the analysis of information such as assessment scores, grades, and attendance; and collaborates with district student information systems to publish data to parents, students, teachers, and administrators. As it turns out, there are a lot of similarities between enterprise and educational environments — and not always positive ones.

“Education has nightmarish integration problems, silos of data, not well-described or easily accessed, possibly bound to legacy systems, and there may be licensing restrictions that don’t allow access to certain data,” says Dias.

If you thought business deadlines were tough to meet, consider the plight of the teacher with 120 kids and just a 15-minute window to prepare for the next day’s classes.

Read more

The ‘Digital Human’ and the Semantic Web

Jennifer Zaino Contributor

No doubt about it, the technology behind the semantic web is truly disruptive. The challenge for companies will be to figure out how to provide applications based on it that match business drivers and meet business goals. Oh, and those applications had better account for the fact that understanding the “digital” human component — within and without organizations — is critical to those ends.

Learn how the Semantic Web is changing the way we treat data at the LinkedData Planet Conference. Sir Tim Berners-Lee, inventor of the World Wide Web and director of the W3C, is among the event’s keynote speakers.

“With a disruptive technology such as the semantic web, there’s a need to really talk about the current state of how we develop semantic technologies and applications,” says Dr. Rachel Yager, director of semantic web company Machintas Inc., which is developing technologies to enable people to better represent themselves in their digital lives. “Not small-scale stuff, but large, enterprise-strength applications.”

In a presentation to be delivered at the LinkedData Planet conference taking place June 17 and 18 in New York, she’ll be doing just that, in a discussion that explores the challenges and approaches for effective semantic systems development. “In the software development lifecycle, what are the best approaches and processes and methodologies that one can use to start looking into semantic applications, and to cater to the evolving technologies in this exciting domain,” Yager says.

Indeed, the semantic web community is vibrant … and at a crossroads in terms of making this disruptive technology viable, says Yager.

“I really think that when we are dealing with the semantic technologies of today, there are some positions that a company can take for the future to better harness the evolving nature of this technology,” she says.

Semantic web technologies can help close the gaps that still exist in IT environments in terms of scalability, flexibility, and agility, and companies can take advantage of that to develop enterprise-class systems that can uncover and exploit the linkages among people, their own understanding of concepts, other data sets, and mass intelligence.

Don’t underestimate how important it is to consider the people part of the link. Human beings and their digital lives are now part of the web fabric, says Yager, and the line between the real and the virtual human is growing increasingly fuzzy. “The human person is part of the link, of the linked data planet — in fact, the most important link,” she says. “We’re in pursuit of flexibility in the name of building agile software for connecting better with people, representing people better in this linked data planet -that is needed.” Human relationships need to be understood to meet business objectives, she says.

For example, a business’ core assets are its employees, and the knowledge that is linked to those people, individually and as part of the collective intelligence, is critical to fulfilling enterprise missions.

“What distinguishes one company from another is the culture and the people and how they make things work,” says Yager. “The representation of people and the understanding of people and linking all that information — we now have a powerful and new way that we can have a richer representation of ourselves in the data on the web. And companies have to care about that, because their employees are going to be part of that.” The same is true about leveraging that understanding on the customer front. “There will be a stage when humans are the content,” Yager says.

Machintas, with its expertise in computational intelligence, is headed in the direction of helping companies understand the human part of the equation, by developing technologies that enable one to represent himself better in his digital life. It aims to bridge the gap between the way a computer and human can think, and to better link the human to the computer. Among the technologies it is working on to achieve these ends is adding a new dimension to semantics, in the area of granular computing. Granular computing, Yager says, is a confluence of technologies, including “fuzzy sets,” that allow someone to represent rich human concepts in a formal way in the computer. That’s a challenge, as concepts such as young and old, rich and poor and even risk, can mean something very different to different people.

Let the Machine Do the Learning

Jennifer Zaino Contributor

So-called machine learning and the semantic web go hand in hand, for exploring and exploiting the continuum between structured and unstructured data to connect diverse sources of knowledge on a large scale.

Learn how the Semantic Web is changing the way we treat data at the LinkedData Planet Conference. Sir Tim Berners-Lee, inventor of the World Wide Web and director of the W3C, is among the event’s keynote speakers.

One expert put it this way: “Technically, people used to make strong distinctions between unstructured data in free text, and structured data that was digested and put into a database that people could use,” says Dr. William Cohen, associate research professor at Carnegie Mellon University’s Machine Learning Department. He’ll be speaking on the topic of using machine learning to discover and understand structured and unstructured data at the LinkedData Planet Conference, June 17-18 in New York.

“But there is a continuum between these. Web sites, for instance, have information with some structure — tables and lists, often derived from an underlying database but presented in a way people can understand. It’s intended for the human user, not the computer,” Cohen says.

For the semantic web’s capabilities to be realized, it needs machine learning to make the connections among these pieces of information in whatever format, and from whatever source, on a large scale. Consider, for example, a large organization that is the product of many acquisitions over the years, where different sub-organizations have different relationships with the same customer, expressed in different formats. It’s a lot of work and technically hard to do to try to understand that customer in the context of the whole organization through traditional rules-engineering approaches, and many of these knowledge engineering approaches fall down with larger and larger sources of data and more diverse sources.

“The way it’s done today, it’s labor intensive and costly. The goal is to do it better, faster, and cheaper, and on a broader scale,” Cohen says.

Read more

SemTech’s Wave of Semantic News

Jennifer Zaino Contributor

There was plenty of product news coming out of this week’s Semantic Technology conference in San Jose, with companies including Thomson/Reuters, TopQuadrant and Expert System making significant product announcements.

Other noteworthy announcements include:

  • Ontoprise, one of the builders behind Vulcan Research’s Project Halo, announced OntoBroker 5.1 and OntoStudio 2.1. The company touts the Ontoprise product suite as being the first to support all major Worldwide Web Consortium (W3C) SemanticWeb recommendations including OWL, RDF, RDFS, SPARQL, as well as the F-Logic industry standard.

    With version 5.1, the Triple Store OntoBroker RDF can transform between OWL, RDF, and F-Logic via the same ontology management API so users can choose the right ontology language for the right task. It also provides improved multi-core performance.

    A new collaboration server lets OntoBroker acts as the central server with remote access to allow for the distributed usage, management, and editing of ontologies, and for collaborative modeling with OntoStudio 2.1, the vendor says.

    OntoStudio itself offers new features, including the ability to separate schema and facts and re-use central elements in different ontologies, and an improved mapping tool to help knowledge engineers quickly map heterogeneous data sources, the company says.

    The company also announced that it will be developing an expert system for the largest hydroelectric power plant in Southeast Asia. The expert system from Ontoprise called CEXS (Computer Guided Shell Expert System) will support the
    operating staff of the Bakun hydroelectric plant in Malaysia in the detection of possible malfunctions, helping to avoid outages by the use of rules and expert knowledge, Ontoprise says.

  • Franz Inc. has introduced AllegroGraph 3.0, a high-performance triple-set database designed to help companies glean insight from their troves of unstructured data, on the rise from the growth of social networks and semantic web applications.

    Capable of storing and querying billions of RDF statements, the company says the product provides customers an events-based view (what type of event, who was there, start and end time, and location) of data sets, with the goal of helping them speedily link various pieces of information and reason through it.

    The vendor also is aiming at helping developers learn how to create scalable applications for the semantic web: It has introduced a new Learning Center for that purpose, and to help drive understanding of RDF database technologies and best practices for its software.

  • Pragati Synergetic Research released Expozé 2.0, which analyzes complex information systems to facilitate their interoperability, reuse, knowledge capture, and quality assurance. Expozé is an integrated set of modules that lets users dissect and analyze knowledge systems, ranging from structured sources such as database schemas, to semistructured sources, like OWL and RDF-based ontologies and rule-based systems, to unstructured sources, such as natural language text, the vendor says.

    “We are confident that Expozé’s niche technology is the missing piece for overcoming the semantic mediation bottleneck on the semantic web,” said Mala Mehrotra, president and CEO of Pragati, in a statement. The vendor has customers in both the emergency management and military sectors.

  • Professional services firm Zepheira, which specializes in semantic technologies and enterprise data integration, said it had facilitated the integration of two leading open source semantic web frameworks.

    These are the Mulgara semantic store (a scalable RDF data store, written in Java and designed to scale to hundreds of millions of RDF statements) and Aduna’s Sesame Version 2.2, an open source Java framework for the storage, inferencing and querying of RDF data.

    David Wood, a Zepheira partner and Mulgara project member, said in a statement that he sees tremendous value in the integration of the projects, their developers and their communities of users.

    “The comprehensive set of APIs and standards from Sesame along with the scalability and speed of Mulgara gives developers a single port of call for Semantic Web projects of all levels,” said Paul Gearon, founder and leader of the Mulgara Project, in a statement. “We have now combined the experience and skills of the developers from both projects, giving us much greater capacity for ongoing development than when our efforts were split between unrelated systems.”

  • Technology Review – Operator

    As I was preparing for my upcoming embedded RDF/microformats talk at the Semantic Technology Conference, I wasn’t too surprised at the level of discourse that has transpired between the RDF evangelists and the microformat followers.  I’ve been around long enough to remember the object-oriented vs. relational database arguments, various interpreted vs. compiled language brouhahas and CORBA vs. the world.  Many of the same competing themes reappeared: performance, ease-of-use, flexibility and learning curves to name a few.

    Read more

    TopQuadrant Updates Semantic Apps Platform

    Paula Gregorowicz Contributor

    Semantic web software vendor TopQuadrant on Monday announced the release of TopBraid Live™ 2.0, a semantic application deployment platform for the enterprise designed to simplify the creation of web services to a “click and connect” process.

    It marks the start of the movement of the semantic web out of the halls of academic exercise and into the hands of power users in the business. What makes this announcement significant is that it demonstrates the strategic direction of the semantic web — using a standards-based platform to bridge the gap between strategic goals, business processes, and technology.

    The functionality of TopBraid Live is based on a combination of SPARQLMotion Web Services and a Flex API.

    SPARQLMotion is a visual scripting language that allows power users within the business to create semantic web applications. End users can integrate data sources, run queries on the combined data, and create information mash-ups and reports without assistance from the IT department. SPARQLMotion is fully compliant with and utilizes the W3C standard SPARQL.

    Flex API is a client-side API that can be used to deliver internet applications using Adobe’s platform-independent Flash Player. It provides out-of-the-box components to display and edit semantically enabled information.

    TopBraid Live allows users to make existing data available to semantic search engines such as Yahoo! SearchMonkey.

    TopQuadrant also announced that the TopBraid Suite™ now offers integration with Oracle 11g. This allows companies to build semantic applications that access the native Resource Description Framework (RDF) storage capabilities within Oracle Spatial, an option to Oracle Database 11g Enterprise Edition.

    In addition, as a part of the Oracle PartnerNetwork, TopQuadrant developed an Oracle rules editor that enables business users and domain experts to define the business rules and application reasoning within a semantic application, without having to use SQL.

    Standards such as RDF and OWL serve as the foundation for the semantic web. This integration marks the first application that can leverage Oracle’s RDF Data Model as a scalable, secure and reliable RDF management platform, TopQuadrant says.

    Perhaps the biggest barrier to adoption of semantic web technology, TopQuadrant officials say, is that there is no good standards-based enterprise platform that can connect to various, best practices data stores. “TopQuadrant is building that very platform,” say company COO Robert Coyne and chief scientist Dean Allemang.

    TopQuadrant officials say the company’s strategy is to invest heavily in standards-based technologies and seek to make the semantic web accessible to both business and technical users.

    “Making it relevant so it can support users in business processes” is how Coyne and Allemang explained it during a recent phone briefing.

    While TopQuadrant has been around since 2001 as a pure semantic web consulting firm, the company recently hired Dr. Jeremy Carroll as chief product architect. Carroll has contributed to many semantic web standards from the World Wide Web Consortium (W3C) and was the lead architect in the creation of Jena 2.0, an open source Semantic Web framework developed by Hewlett Packard Research Labs.

    Calais’ Second Step

    Jennifer Zaino Contributor

    Thomson Reuters’ Calais service is hitting Version 2, with a number of improvements, including the Marmoset plug-in for Yahoo’s SearchMonkey service. The company is announcing the news today at the Semantic Technology Conference (SemTech).


    “Marmoset takes unstructured data and feeds it to the monkey,” says Thomas Tague, Calais evangelist and project lead, Thomson Reuters. Any site owner can embed the code in their template files, and when SearchMonkey comes by to search the site, Marmoset wakes up, sends the content on the page to the Calais web service, which performs metadata generation on the fly and returns the metadata as microformats embedded in the blog page for SearchMonkey to harvest.


    “Our issue with SearchMonkey was its limitations,” says Tague. “The first is that there’s not a lot of semantic metadata out there to scrape, and the second is that, for unstructured content like news, there is even less where people took the time to create semantic metadata. So we created Marmoset.”


    The company is also announcing an open source plug-in that was actually built by Phase2Technology, a Drupal module for Calais that lets mid-tier and smaller newspaper publishers – which increasingly are starting to use the Drupal open source content management system as their publishing platform – automatically attach semantic metadata to any of their content.


    Tague calls this in some ways a more powerful version of the WordPress plug-in for bloggers that Thomson Reuters is also unveiling with this release. The WordPress plug-in addresses bloggers’ laments that rich tagging is a pain, and finding images that are pertinent and copyright-acceptable is even more of a pain. The new plug-in returns tag suggestions based on text typed into a blog, but gives users the option of choosing which they want to apply.


    Then, says Tague, “the cool stuff starts to happen.” Calais finds in FlickR pictures that match the tag or any combination of tags, ensures they are copyright-acceptable, and then gives bloggers the option to size it, write hyperlink text for it, and inserts the final image into the blog post. “That’s not going to change the world, but our goal was to make blogging more fun. But more importantly  it’s about how you thread the needle between a pure folksonomy and a pure taxonomy,” says Tague.  


    Expanding core API capabilities


    He puts the cost of developing the WordPress plug-in at about $50,000. That’s some $45,000 more than the bounty the company was originally offering to developers to create such a plug-in. Tague says that the company received only about half a dozen entries, and none of them was the great application it had hoped for.


    “Something went wrong and we didn’t get the attention of the major players there,” he says. “I did a full disclosure on the [Calais] blog that the right thing is not to accept any of these and instead hire a company to develop this for us that knows how to develop production-strength WordPress plug-ins.”


    It may get more of the attention it wants from the development community with a revise of its web site to include more sharing and community-based capabilities. Some 3,000 developers have registered so far. The other big focus of the last quarter, Tague says, has been some expansions around Calais’ core API capabilities. That includes rolling out two new output formats – simple tags and microformats to tag pages on the fly – to supplement RDF, which many of its users find to be just too much overhead for their needs.


    “The other side is more subtle but long-term more interesting,” he says, and that is the rolling out of dozens of new entities, primarily in areas like pop culture.


    “What’s important is we proved a model where we can use open data sources, like Freebase, combined with NLP (Natural Language Processing) to generate entities more rapidly. Over the coming year that will let us dramatically expand the knowledge domains Calais covers,” he says. “We’ve always been strong at business news, now we’re getting strong at entertainment news, sporting news, we’re looking into bio technology. We’re getting request for things we didn’t expect. But we want to be the semantic plumbing for everyone, so we must expand our domains rapidly.”

    Semantic Web Shows Signs of Maturity

    Jennifer Zaino Contributor

    When the 5th European Semantic Web Conference gets underway June 1 in Tenerife, Spain, attendees will have more to look forward to than the beauty of the sunshine-infused island in the Canary Archipelago. They’ll also have in their sights evidence that the semantic web industry is reaching maturity.

    “Looking at the number of papers [submitted], there are quite a few talking about real applications that people are building,” says Sean Bechhofer, a researcher at the School of Computer Science, University of Manchester, who is one of the program chairs of the conference. “It gives an indication of the fact hat this field has kind of reached a level of maturity, where the technologies are mature in that you can begin to build things on top of them.”

    The community is less worried about whether the underlying technologies will break, he says.

    Some of the speakers at the conference will be addressing issues around the challenges that are still posed in deploying semantic technologies for end user applications, including Nigel Shadbolt of the University of Southampton and also CTO of semantic start-up Garlik. Bechhofer hopes to hear participants’ thoughts on the user interface issue, for one.

    “How do we interact with all this data that’s out there. I would personally hope that it should be the kind of thing that in some ways you wouldn’t notice — that it would just be ubiquitous and part of the framework of your interaction, that you don’t know that you are being pushed semantic data. Your life is easier and you may not know why.”

    One area that is getting a lot of traction is applications that deal with cultural heritage, such as those relating to museum collections. It makes sense, says Bechhofer, as that world has always had a tradition of rich metadata for classifying, annotating, and documenting objects. “They’re in sympathy with the whole notion of semantics metadata, that rich annotations are a good thing in helping to maintain and organize a collection,” he says.

    The fifth incarnation of this conference saw 275 submissions, of which 51 papers were accepted in the end on the research track. Among the topics they explore are ontology creation, and content creation, annotation and extraction.

    “One would expect that. We’re at the point where you need to populate this stuff. We need content to be able to demonstrate the benefit of the semantic web approach,” says Bechhofer. A number of papers are covering semantic web services, relating to additional annotations on web services to aid in discovery, coordination, and composition. The SPARQL query language is getting its fair share of attention too, with research into extending and implementing it.

    The conference this year will also feature panel discussions as a new element, addressing topics including social network data portability. “We have many social network sites around where users have multiple profiles and there’s no easy mechanism to move them around from site to site. Is semantics something that could help us in that problem,” Bechhofer says.

    Another panel also will explore how semantics fits into the new field of web science, which aims to understand the various factors that drive the growth of — scientific, technical, and social.

    Selling and overselling the Semantic Web

    We got on the train together. I had just finished a four-day
    training/consulting session with a company doing information
    integration for international security. She was doing a master’s
    degree, with a thesis about Ontologies. Like a good grad student, she
    was a voracious reader. She had read white papers, research papers,
    books, web pages, magazine articles, and anything else she could get
    her eyes on. The more she read, the more confused she became.

    Read more

    What’s Next: Intelligence at the Interface

    Jennifer Zaino Contributor

    Intelligence at the interface — it’s just over the semantic horizon.

    “This concept is what I really believe is under-emphasized when people get excited about the bottom-up technology of semantics,” says Tom Gruber, a pioneer in the field of knowledge sharing and collaboration on the web who established the DARPA Knowledge Sharing Library.

    Gruber, who also co-founded RealTravel and Intraspect Software, and is founder and chief scientist at organizational effectiveness consultancy Consider Solutions, hates to use the phrase, but he sees a “paradigm shift” underway. The world is moving from hyperlinks, portals and search engines, where the onus is still on the user to figure out how to get the information he wants, to intelligence at the interface.

    “The breakthrough is we’ve gone from where the user has to be relatively smart to say the magic words, to where the system is relatively smart now, and the user can more or less sit back. It’s heads-back interaction. You’re just driving, you’re just typing,” he says. “And the system knows enough about your preferences, your needs, where you are, and what you’ve done, to be able to advise you proactively,” he says, as it leverages the collective intelligence of the social web.

    It’s already beginning to happen with services such as Twine, which helps users organize and share and discover information without having to meticulously bookmark and categorize and tag it themselves. What has yet to happen is to bring all the dimensions together — location and time awareness, your social networks, your trusted sources, and so on, in a service.

    “Richer data and more rich inferencing produces a kind of emergent service, the quality of which isn’t available today,” he notes.

    Gruber, who will address this issue as part of his presentation at the upcoming Semantic Technology Conference (SemTech), being held May 18-22 in San Jose, Calif., sees this computing on behalf of users as the natural result of our online lives being lived in relative transparency. “We’re already giving up our privacy and exposing ourselves to the infrastructure — let’s get the infrastructure to make maximum use of that information and bring the intelligence of the system to that interface,” he says.

    Read more

    NEXT PAGE >>