Young Guns Driving Semantic Web (Part 2)
Jennifer Zaino
SemanticWeb.com Contributor
Christian Halaschek-Wiener, Aditya Kalyanpur, and Jennifer Golbeck have good reason for getting excited about the future of the semantic web. They are actively, and have been actively, contributing to its development. (For more on how they got started, see Young Guns Driving Semantic Web (Part 1). Here’s a look at the work the three are doing.
Christian Halaschek-Wiener
Halaschek-Wiener, now CTO at a financial domain startup, has been recognized for work he has done on the semantic web and the financial services sector. For his dissertation, he developed a content dissemination framework founded upon the semantic web standard called OWL (Web Ontology Language) that leverages OWL’s reasoning capabilities for matching newly published information with users’ interests. The syndication framework provides many benefits over traditional approaches that use syntactic matching techniques (such as keyword or XML-based matching), including inferring additional publication matches for users’ interests via the inference capabilities (specifically description logic reasoning) provided by OWL, he says.
“There’s been talk of doing this more expressive syndication, and the hard problem I was trying to solve was how to make this more expressive syndication practical for high demand syndication domains,” such as in the financial services area, Halaschek-Wiener says. “There are tons of financial information publishers. But there are time constraints on how long it takes to process newly published information and integrate it into whatever other system is on the other end of line. Time is money from an analyst point of view, and so to process the information in a timely manner is of critical importance.”
| More From Jupitermedia |
|
Young Guns Driving Semantic Web (Part 1). Military, Universities Team Up on Big CALO Project Radar’s Twine Ties the Semantic Web Together A Snapshot of Semantic Web Trends If you want to comment on these or any other articles you see on Intranet Journal, we’d like to hear from you in our IT Management Forum. Thanks for reading. - Tom Dunlap, Managing Editor. |
For all its context advantages, OWL’s reasoning techniques introduce some overhead issues, which is problematic in a dynamic environment where there is a constant inflow of news stories and information about whether there is a match with respect to some user query. For example, an analyst may want to know any news that is related to a certain class of companies. Making the assumption that the news stories are encoded in OWL, whenever a new publication comes into the framework, the system Halaschek-Wiener envisioned would determine if there is a match for the new publication with the subscriber’s requirements.
Dow-Jones Newswires, a leading provider of global business news and information services, has an historical database of news feed articles going back about 20 years, all of them in XML with metadata tags. The company furnished him with its datasets and subscriptions of interest for their domains and clients, which allowed him to apply his framework to a high-demand syndication domain. This enabled him to perform a real-world assessment of the scalability of the framework in general, as well as the incremental OWL reasoning algorithms that he developed in his dissertation to make the framework practical.
“The results were promising, and we showed that, given some realistic ontologies and domain modeling, we could keep up with the news wires,” he says.
Aditya Kalyanpur
As a graduate student, Kalyanpur says his most interesting work was on the debugging and repair of RDF/OWL datasets. Prior to this work, he says, OWL reasoning tools would only tell users about errors in their data, with no explanation of how these came about or how to fix them. Given the complex nature of the logic underlying OWL, users found it very difficult to understand and debug these errors, he said. His work focused on building a set of OWL debugging tools that would explain the precise cause of ontology errors in a user-friendly manner, and suggest appropriate repair plans.
“Interestingly, the solutions developed are of a more general nature and can be used to explain any logical inference (not just an error) that follows from an RDF/OWL dataset,” he says. “Thus, for example, any OWL reasoning engine that does semantic query answering over an associated RDF data store can use the techniques developed to justify/explain an answer to the query, which can be very helpful to end users.”
Now a researcher at the IBM TJ Watson Research Center, Kalyanpur’s work is putting him in the thick of thinking about how the semantic web meets the real
world. Like Halaschek-Wiener, he’s working on solving issues related to the scalability of ontology reasoning on the Web. The team he works with has developed what he says is a highly scalable and expressive ontology reasoner, known as SHER. SHER can reason on about 7 million triples in seconds, and scales to datasets with 300 million triples. In addition, SHER can help the user clean up data inconsistencies before issuing semantic queries, and it also provides explanations for why a particular result set is an answer to a query, he says.
IBM has used SHER in two large-scale ontology reasoning applications. The first is to find electronic patient records that match clinical trials criteria.
“The problem in automating clinical trials matching is that patient data is noisy, coded in local terminologies, and highly specific. Clinical trials queries, however, tend to be much more general,” he says. “Together with researchers from Columbia University Medical Center, we used the knowledge in the SNOMED ontology to bridge the gap between electronic medical records and clinical trials queries. SHER successfully found matches for the clinical trials queries on a large one-year patient dataset from Columbia (60 million triples) in minutes.”
The second ontology reasoning application is in the context of SemanticClean, a project that uses OWL reasoning to clean up inconsistencies in data generated by text extraction. He says SHER was able to detect several thousand inconsistencies in large amounts of noisy text extraction data in minutes.
An upcoming release of IBM Alphaworks Web Service, known as Anatomy Lens, also has him jazzed. The next-generation search engine running on SHER technology has the aim of helping scientists hone in on articles that are the most relevant to their research. “In the spirit of the Semantic Web, we have integrated multiple datasets and ontologies: Foundational Model of Anatomy (FMA), Gene Ontology (GO), Gene Ontology Annotations (GOA), MeSH and PubMed. The power of the search and inferencing comes from this integration.”
Users enter MeSH terms, anatomical terms and biological processes as search keywords. For example, a query could be about articles related to “Alzheimer’s, brain, and neuron development.” Anatomy Lens conducts a semantic search with more precision and with better recall than text search, he says. In the example, Anatomy Lens will also return Alzheimer’s articles that talk about dendrite development in the hippocampus; in contrast, a standard text search will only find articles containing the explicit queried keywords, and may also find articles that are unrelated.
The higher recall in Anatomy Lens comes from using ontologies as knowledge bases to expand the search, he notes. Through inferencing, it is able to determine that if a user is interested in the brain, he/she is also interested in the hippocampus but not the spine. To improve precision, Anatomy Lens does its search over user-generated metadata on articles instead of keywords. In the above example, an article that mostly talks about the spine but also happens to mention the keyword “brain” will be returned by a standard text search, but not by Anatomy Lens.
As an added point, the size of the underlying data is huge (about 300 million RDF triples), as is the expressivity and size of ontologies used for inferencing, and most typical queries are answered in a few seconds, which demonstrates that large-scale, real-time reasoning is very much possible.
Jennifer Golbeck
Golbeck, now a faculty member at the University of Maryland School of Information Studies, these days is extending the PhD work she has done around the way people in social networks compute the trustworthiness of others and use information they get from them. So she’s been spending her time among developing new methods for computing trust that don’t require individuals to have stated their explicit trust values for others in the network, including looking at FOAF (Friend of a Friend, a machine-readable ontology describing persons, their activities and their relations to other people and objects) and how it works.
“There are tons of profiles but some of them are dupes,” she says. It’s a small fraction of users with multiple accounts, but that small number of people usually connect big groups “Can you merge those profiles? I just finished a study in January that looked at taking FOAF and seeing if you can use semantic technologies to merge profiles of people from different social networks. It turns out that you can.”
The first step was to run semantic web reasoning techniques over FOAF data to merge people with multiple profiles and connect about ten big and disconnected social networks (those that output FOAF and have more than 1,000 people).
“People use FOAF as an example of how the semantic web had been successful,” she says. “But if you can’t reason over it in an effective way, it’s not a good example of that.” She currently has a paper under submission to show that merging profiles works really well.
That’s good, because if this weren’t possible, the FOAF data wouldn’t be clean enough to be used as a basis for building applications that rely on these distributed social networks to compute trust. “The step now is to take the work I’ve done on computing trust from the information you have about people and integrating that with the giant FOAF network, and then building intelligent systems,” she says — like the one she created for her dissertation project that created movie recommendations based on the explicit trust values users in a purpose-built social network assigned to others in the same network.
Such work could result in some interesting new applications, such as a novel approach to email filtering. “You get messages from people you don’t know all the time, and you want many of those. But I also get zillions of spam email, and some good messages always get filtered into the spam box, and you never read them because there are so many spam emails that you can’t go through them all to find the good ones,” she says. “You can use something like this to highlight the messages that are from people you trust in the network — either those you know directly or that you have paths to in the social network, to sort of sort your spam box.”

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...