Garbage In … and Other Semantic Web Challenges
Michael Marshall
SemanticWeb.com Contributor
If you’re on the cutting edge of Web technology you may have seen the term semantic web get a few headlines. Or maybe you’re immersed in the “semweb” world. But most folks aren’t, and they’re starting to ask: So what is this semantic web?
The semantic web consists of a grouping of technologies and standards for data interchange that make the content of the Web easier to access and interrelate by machines and easier to process by both machines and humans. The vision is to fulfill more of the Web’s potential by allowing data to be shared effectively by wider communities.
According to the World Wide Web Consortium (W3C):
“The semantic web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.”
Benefits
As the vision of the semantic web is realized and more companies and programmers embrace it, the benefits are quite significant. Both commercial and non-commercial enterprises stand to gain from:
Interconnectivity of diverse data sources,
Gleaning more usable and actionable information from raw data,
The blending of machine categorization of information with human insight and expertise, and
Improved functionality of Internet and intranet search engines.
Challenges
One of the main challenges the Semantic Web faces is in the role that people play. This is the crucial area of human contributions to help explain to machines the relationships between data. There is a tendency to put excessive trust in “computerized” data, and a propensity for individuals to accept blindly whatever comes from the computer.
You’ve no doubt heard the term GIGO (Garbage In is Garbage Out). As input and contributions come from a multitude of users via tagging, behavioral data, and other forms, there will be at least three semantic web issues that will need attention:
Incorrect tagging,
Malicious tagging, and
Spam tagging.
To the extent that results returned to an Internet user are influenced by the relevant input from other users, those results may become skewed due to any of (or a combination of any of) those factors. Steps will have to be taken to address these potential problems as much as possible, both preventative and corrective measures.
Related social implications
Because search results returned to an Internet user are influenced by the relevant data from other individuals, do we lose factual accuracy or objectivity in the search results? This is one example of a potentially negative influence from the human element of the semantic web.
In one form of postmodernism or pragmatism, meaning is a product of whatever linguistic community you’re in, and there is nothing beyond that which you should seek because there is nothing beyond that to be had — no truth with a capital T. In the semantic web, are the contributors akin to the linguistic community and the accuracy of the results from your search akin to the postmodern notion of meaning; no facts with a capital F, no objectivity, no Truth in advertising?
A (modified) quote from the great playwright comes to mind:
“There is nothing either good or bad but
thinkingtagging makes it so.” — William Shakespeare [Hamlet Act II, Sc. II].
Analysts often discuss the impact of the Internet and email on culture. There may arise similar discussions about the impact of the semantic web on culture as the information that people find, hold on to, and make use of may be viewed as accurate and true – but all the while, it is only a product of the collective musings, however ill-informed, of the masses.
Conversely, there will most likely be some very well-informed areas of the semantic web as well; this could create quite a disparity in the quality of and therefore benefit from the semantic web in that regard.
Michael Marshall is an innovative software developer, trainer, and consultant in the Search Marketing industry. He has more than 19 years experience in information technology covering a wide range of specialties including: web design, software engineering, e-commerce solutions, artificial intelligence, and Internet marketing. He is also a certified instructor at the North Carolina Search Engine Academy.

The 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...