Reasoning About Semantics
Fred Wild
SemanticWeb.com Contributor
Semantic web concepts carry quite a bit of enthusiasm and hope: the hope is that semantic web ways and means can help us make sense of the vast ocean of resources out in the Internet, or perhaps make sense of our smaller seas of resources within our corporate data centers.
Topping the list of “things the semantic web is supposed to provide” is context sensitive search. Now, I purposefully did not say “semantic search, ” simply because I want to describe how to reason about semantics and so I need to use other terms. I chose to use the term “context” to illustrate how semantics can be applied. Think of ontologies (which describe types of things — or resources — and their properties) as a way of establishing a context. If you adopt a semantic context, the things you find when you search, as well as the properties you uncover about the things you find, belong to that context.
As such, you can think about the semantics used in searching as a sort of lens through which you see matching resources. Which is to say: without a semantic context, documents are just documents, undistinguished from each other in any contextual way. Perhaps you can search for documents by filetype and perhaps also limit the search to those files containing certain words, but this is a lexical search, not a semantic one.
To understand the difference, consider that when I search for a dwelling in a certain price range, I want to see all of the things that mean dwelling (home, dwelling, house, residence, condo, cottage, townhouse, …). An ontology describing Real Estate might establish an equivalence between Dwelling and these other classes, so my search finds things of similar meaning, not just the string of characters making up a word.
Turning to a document example, I may want to search for all of the documents in my document repository that are legal documents, and even particular types of legal documents. One way I can imagine doing this is to first set my search context to use an ontology describing legal artifacts. Then, using this context, I can ask to see all of the Litigation documents.
It may sound magical, but in fact is quite mechanical. Within my ontology for legal artifacts a document is a litigation document if it is either explicitly tagged as such (a member of the class; Litigation Document), or matches the criteria that infers with high likelihood that the contents matches that of a litigation document. Although the inference criteria may not be perfect, it can be refined over time. Tuning the criteria allows us to find legal documents of specific type much easier than just doing a brute force search. Also, when we work with a pre-determined set of ontologies, mechanisms can go out ahead of time, and apply the criteria to the documents to pre-classify them according to the inference criteria expressed within those ontologies.
Inference is most valuable in cases where the authors of documents, or their authoring tools, don’t help very much in establishing the context and meaning of documents. One could supply information for use in semantic search by supplying meaningful metadata along with the contents of the document. In the future, more sophisticated authoring approaches would find authors doing this, but for now, let’s talk more about the discovery of meaning and applying classifications to existing documents.
I recently submitted a friend’s resume to an online employee referral site. It was able to scan the text uploaded and pull out the educational history of the individual (among other things) and present it for verification with very good accuracy. This sophisticated scan of the document is an example of extracting facts that go beyond simple word matching, and are useful in semantic based searches. It is clear that one could later apply a localized ontology of colleges and universities with this the capability to express something like, “Show me all of the resumes of candidates who graduated from a Preferred School holding a Postgraduate Degree in an Engineering Discipline” — assuming definitions of Postgraduate Degree, Engineering Discipline and Preferred School within that ontology.
On top of applying a fundamental group of classifications, we can build up layer semantics by treating classes as sets of things, and using set membership expressions to narrow down the set of individuals to exactly the ones we want. We do this via the familiar means of union, intersection and complement.
Our example ontology can be expressed as a number of classes that are relevant to sets of education credentials.
Resume – the documents uploaded; we assume they are in fact resumesCandidate – the person to whom the resume pertains
School – the set of all colleges and universities
Preferred School – the subset of Schools that we prefer; membership explicitly declared
Degree – the set of credentials awarded to graduates of the members of School
Degree From Preferred School – subset of Degree: those awarded by members of Preferred School
Postgraduate Degree – { Masters, Doctorate }
Qualified Degree – { Postgraduate Degree } AND { Degree From Preferred School }
Candidate Degree – the set of Degrees possessed by the Candidate extracted form their Resume
Qualified Candidate – a candidate who possesses at least one member of the derived set {Candidate Degree} AND {Qualified Degree}
It should not be surprising that this looks like a collection of rules, even arbitrary in some cases. For example, our collection of preferred schools is simply an enumerated list of schools from which we prefer to get our candidates. We can add or subtract from this list as our experience with schools matures. If we do make changes to the list of preferred schools the results will be different. This is because we have changed the meaning of what it means to be a preferred school, which in turn qualifies degrees that were not previously qualified.
It is also possible that postgraduate degrees come in other forms besides masters or doctorates. When we add those, our results will also differ, because we have changed the meaning of what postgraduate degree offers the classification exercise.
If we now go back and consider the business context of all the different kinds of documents that we have under management, and the various things that can be said about them, you might imagine how domain specific ontologies of your own might be applied to classify and find documents according to your own criteria.
Creating good ontologies is a careful modeling exercise. The skills for doing this kind of work are generally found in the community of people possessing the role of Business Analyst.
Allow me to set an expectation: wielding ontology mechanisms is not the difficult part of applying semantics. The difficult part is the human element in agreeing on what a system of classifications should be.
Classifications are always based on perspective. In some cases these perspectives can be merged into a single ontology. In other cases the perspectives are different enough to make up domain specific ontologies of their own. The activity of modeling ontologies takes practice and patience, but once you have created a good collection of ontologies, their enabling aspects can make you wonder how you lived without them.
Fred Wild is a senior technologist in the EMC Office of the CTO.

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...