Tagging and the Semantic Web
John Clarke Mills
SemanticWeb.com Contributor
A while back I commented on a TechCrunch article quoting Twine CEO Nova Spivack regarding keyword searches in the Semantic Web space. My comment was later quoted on the Faviki blog, a semantic startup involving tagging web pages with semantic Wikipedia data. I thought it would be useful here to go into a little bit more depth on semantic tagging and what we’ve learned thus far.
Tags the way they are implemented today
The way the better Web 2.0 sites implement tags involves faceting. In a nutshell, it allows you to group together documents or objects based on attributes. For example, a collection of all documents about ‘George Bush’ and ‘Washington’. The problem with these attributes is they have little or no value on their own and they certainly are not understood by computers. They are just strings denoting some type of concept. To that end, here is a short list of limitations that the Semantic Web will address:
* Tags do not provide enough meaningful metadata to make meaningful comparisons.
* More information is needed besides their origin.
* Tags are essentially a full text search mechanism, although faceting helps.
* Need more relationships between tags and the objects they pertain to.
The solution: Tags as objects
Allowing users to tag an object with another object allows us to make extremely interesting comparisons; discerning a lot more information about the original object becomes simple and accurate. With this type of interrelationship we can pivot through the data like never before, not with full text search but with object graph linkages that machines and humans can understand.
Let’s go over an example:
Let’s say a user adds a note into our system ranting about a beet farmer who lives in Washington state by the name of William Gates. The user goes on to discuss his beets and farming techniques in great detail, mentioning nothing about software and Windows Vista, of course. In the current Internet model the user would tag this note with strings like, ‘William Gates’, ‘Bill Gates’, ‘beets’, etc.
Now another user comes along and starts digging through documents tagged ‘Bill Gates’ to try and find new articles about Vista. Unfortunately, many searches will turn up bad results, especially if the density of the word ‘Bill Gates’ is great enough in the document about beets. That being said, the other direction would work more as intended, searching on the tags ‘Bill Gates’ and ‘Beets’ would yield more expected results.
In the Semantic Web model, the document about William Gates (the beet farmer) would be tagged with the William Gates object that could contain a plethora of metadata, including his location, occupation, etc. Now when we look at this document there is no guessing as to what it is referring, especially from a machine’s point of view. This is exactly what the Semantic Web was built for. In this model we are not relying on linguistics, natural language processing, or full text search. We are relying on hard links that machines can understand and relate to.
The disambiguation page (was the tag page in Web 2.0)
What about regular string tags? The thought is that the Semantic Web can’t possibly understand everything – and the fact of the matter is, it’s true. As a result with Twine we still support regular string tagging. Some things are not proper nouns and less concrete, like adjectives and verbs. They may not yet deserve their own object. However, before we throw in the towel, let’s think about actual language here for a second, i.e. the semantics behind how we describe things.
Take the adjective ‘cool’. Well, first of all, what are you looking for? Nouns? A grouping of multiple nouns? Probably ‘cool’ nouns, in fact. A search on this tag could turn up anything and everything from many different levels. It could start by pulling in a definition from Wikipedia. Then it could group together a list of groups tagged ‘cool’ like the ‘Super Cars’ group or the ‘Fast Cars’ group. It would also show you what users tagged ‘cool’ and documents have been tagged ‘cool’. But it becomes really interesting is where you find the ‘cool’ string tag on a tag object. Now you can find proper noun tags like ‘Ferrari’ as well as ‘Super Cars’ — the proper noun.
Joining these tags together in a search would yield detailed results from rich metadata like a list of Ferrari’s over the years represented as objects. Each car object would contain detailed specs on engine type, weight, horsepower, etc. Then by examining the ‘Ferrari Enzo’ object we can find all the people who used this tag on their bookmarks, links, documents, or other objects they created. With this information you can connect with these people, join their groups, and further your search for whatever it is you are interested in. The point here is that everything is related at many different levels. What links them together are the adjectives and verbs that describe them.
Conclusions
To be able to come at your data from every angle is important. Everyone thinks differently and therefore everyone searches differently. The truth is, it is going to be quite a while before machines really start to understand what us humans are talking about. It is up to us to help organize data in a format that is machine readable so the machines can share, but in return it allows us to perform incredible searches likes never before. One day all this work will pay off and the machines may be able comprehend what the word ‘cool’ means; until then it us up to us to think for ourselves.
John Clarke Mills is an application engineer at San Francisco startup Radar Networks, attempting to bring the Semantic Web to life with their first commercial product, Twine.com. Twine is a new service that helps you organize, share and discover information about your interests, with networks of like-minded people. Before coming to Radar, John began his career as an engineer for CNET Networks.

The 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...