A recent article by Michael Hausenblas discusses “why we link.” The article begins, “The incentives to put structured data on the Web seem to slowly seep in, but why does it make sense to link your data to other data? Why to invest time and resources to offer 5 star data? Even though the interlinking itself becomes more of a commodity these days – for example, the 24/7 platform we’re deploying in LATC is an interlinking cloud offering – the motivation for dataset publisher to set links to other datasets is, in my experience, not obvious.”
It goes on, “I think it’s important to have a closer look at the motivation for interlinking data on the Web from a data integration perspective. Traditionally, you would download data from, say, Infochimps or you find it via CKAN or via the many other places that either directly offer data or provide a data catalog. Then you would put it in your favorite (NoSQL) database and use it in your application. Simple, isn’t it?”
The article continues, “Let’s say you’re using a dataset about companies such as the Central Contractor Registration (CCR). These companies typically have a physical address (or: location) attached. Now, imagine I ask you to render the location of a selection of companies on a map. This requires you to look up the geographical coordinates of a company in a service such as Geonames. I bet you can automate this, right? Maybe a bit of manual work involved, but not too much, I guess. So, all is fine, right? Not really. The next developer that comes along and wants to use the company data and nicely map it has to go through the exact same process. Figure what geo service to use, write some look-up/glue code, import the data and so on.”
Read more here.
Image: Courtesy Flickr/ pratanti