Identifiers Are at the Heart of Information Technology
Identifiers have been around since the dawn of the computer age. They are special tokens used to keep track of anything that’s processed in computers. For ages no one cared about identifiers except for technologists. This all changed with the advent of the Internet.
When the very first electronic mail systems were patched together across departments of US defense agencies, they used the equivalent of "send this message to room 6 on the second floor of the red building on the big hill by the pond". As this early e-mail network grew it was obvious this manner of routing mail would just not continue to work. After a while those routing mail would start to confuse the big hills and red buildings. To address this the early architects of the Internet designed a two-tier identifier system.
First of all you identify the organization, and then you identify the party within that organization to whom the message is specifically addressed. The first identifier is what we know as the domain name system. semanticreport.com identifies, uniquely across the Internet, the organization that manages this very journal. The identifier happens to include the classification of "company", which has become less and less meaningful with the mad scramble for attractive domain names. This scramble itself demonstrates how important such identifiers have become in the Internet age. To complete the e-mail address you specify the more specific identifier for the party within the organization. It might be a role, such as "editor@semanticreport.com", or a personal name, such as "scott.koegler@semanticreport.com".
This identifier system has been successful enough to power not only e-mail but also the Web, instant messaging and other globally networked systems. The Web required a way to find not just people or roles, but documents, directories, applications, and other information resources. An information resource is anything whose essence can be downloaded over the network. The Web’s mechanism is the uniform resource locator (URL), which is designed to be more directions to the resource (a locator) than a unique identifier for that resource. A single document might have more than one URL, if published at more than one place.
More and Better
The next step in Web and semantic technology is two-fold. First of all, we want to have uniform resource identifiers, not just locators, so we can refer to a given information resource precisely, even if it happens to be published at several locations. Secondly, we want to be able to identify non-information resources as well, people, concepts, places, etc. If we can do that we can tag information resources with more useful information. For example, it’s useful to tag this article with its author. You could just tag it with "Uche Ogbuji", but (and this might surprise some in the West) mine is not a unique name, and there could be confusion about which "Uche Ogbuji" the tag means. A better solution would be to use my e-mail address, which already has some disambiguation built in, through the domain system. In reality, however, people do change e-mail addresses, so even that can be a rickety solution.
Ideally, the person of Uche Ogbuji would have a unique identifier you can use to unambiguously tag information resources. There are many solutions for this problem. We could establish an authority over IDs and use the domain for that authority to establish URLs that are designed not to change over time. This is the idea behind the persistent URL initiative (PURL). PURLs are URLs that serve as strong identifiers because the manager of the PURLs service is committed to preserving their integrity. Another approach is to address the potential ambiguity within the URI itself. The tag URI scheme (http://www.taguri.org/) includes a date, a domain and a specifier within the URI. An example is "tag:ogbuji.net,2000-01-01:Uche.Ogbuji". Building in the date takes care of the fact that domains can change hands. The authority for each tag URI would have to pick a date at which they had proper control over that domain. It’s not ideal, but it’s better than URLs of arbitrary authority.
The Web is evolving in this direction, and semantic technology brings a great deal of benefit to organizations who follow suit. At present most businesses find their data using means even more crude than "room 6 on the second floor of the red building on the big hill by the pond". In the next installment of this column I’ll discuss how organizational IT can benefit from clean, universal identifiers.
RELATED:
- Wikimeta Project's Evolution Includes Commercial Ambitions and Focus On Text-Mining, Semantic Annotation Robustness
- SemTech Berlin 2012 Conference Explorer App Gives a Taste of Linked-Data-As-A-Service
- Digital Reasoning To Give Users New Tool For "Learning" Custom Data Sets
- Is Your Business Ready for the Semantic Web?

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...