Data Connections: Promises, Problems
Jennifer Zaino
SemanticWeb.com Contributor
Stormy clouds ahead for the semantic web? In a session Wednesday at the SemTech Conference in San Jose, Larry Lefkowitz, executive director of Cycorp Inc., will be speaking about turbulence around linked open data and ontologies.
Cycorp has been building the Cyc ontology for the past 25 years, and last year made its open source OpenCyc ontology available as a downloadable OWL file, as well as made all that content available.
The problem Lefkowitz identifies is around the complexities that aren’t being accounted for as more parties rush to get data represented in a structured format so that computers can reason with it. Simplicity is required for usability, Lefkowitz agrees, but the risk of too much simplicity is the creation of ontologies that are imprecise or represent things in ways that make it hard to connect them to the Cyc ontology or others.
“The knowledge that is being captures has to be captured in a more precise form to be useful for any kind of reasonably complex stuff,” he says.
Cyc in the last year has been trying to link its concept-focused Cyc ontology to a number of external structured datasets and ontologies, taxonomies, and thesauri, so Lefkowitz is seeing firsthand some of the issues that can arise. It sees synergy in linking its ontology with others that take things down to the specifics, such as the names of baseball players on every team or the governors of every state.
“We understand a lot about sports, finance, etc., so we want to hook up with them to answer questions or reason over the specifics of them,” Lefkowitz says. “That’s the rationale for wanting to connect up because there is lots of good information out there.”
It’s confusing, and that’s not just the wine talking
But the rub comes in a few ways. For example, some of the problems are fundamental, such as the distinction between a class and an individual; OWL tutorials, for instance, advise treating Zinfandel as a particular wine-an individual-even as they agree that it is a collection or a class.
“You can’t have the same thing playing both roles,” he notes. “The standard is giving people poor design choices.” That introduces confusion that makes it awkward to hook things up with Cyc, which recognizes Zinfandel, in this case, as a specialized collection. The result is a basic incompatibility in structure that requires you to do “some ugly things to match what we think of as Zinfandel to correspond to what they think of as Zinfdandel,” he says. That might not be too much of a concern for creating simple mashups where not a lot of reasoning is required, but it is more of an issue for those who really want to take advantage of the meaning of semantics.
Other issues that tend to rear up include relationship information that may work well for ontologies created in isolation, to fit a particular task, domain or tools, but introduce confusion when trying to connect things together in a broader context.
“There’s a bunch of reasoning you can do if you understand the semantics of relationships, so relationships need to be as rich as the concepts,” he says. Also important, but still not widely recognized today as parties build their own ontologies, is that context must be explicit so that other things that don’t know assumptions behind information can reason with that data.
How to fix this? “Part of it is that you start worrying about this in the modeling, think beyond the particular application,” he advises, and start linking up to standards where they exist. As standards like OWL evolve, these issues around relationships, context, and classifications need to be addressed.
“It’s a conflict between keeping it simple as a perfectly good business principle, but then you tend to end up with stovepipes,” he says. “We’ve been through these growing pains and we are doing our best to share some of them as lessons learned.”

The 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...