Monarch Butterfly Danaus plexippus Male Yes you are right, one of the prime reasons for this post is an excuse to show some stunning pictures from nature.  However, there is also good reason to explore a Linked Data example, provided by Pete DeVries of the Department of Entomology at the University of Wisconsin – Madison, in his recent submission to the W3C public-lod mailing list.

Pete shows a good example of the benefits of Linked Data.  He provides links to information about the Monarch Butterfly (Danaus plexippus) from several Linked Data datasets, each of which provide different, but overlapping views.

For instance, from BBC Nature data you get links identifying relevant video clips from some of their programmes;  from Uniprot data you get links to the Monarch’s place in the NCBI Taxonomy; from the TaxonConcept data you can follow links to find observation data, and expected location data; and from the dbpedia data you can get publicly contributed descriptions in fourteen languages.   All of these data sets share information about name, genus, links to images, plus sameAs or other specifically defined relationships with identifiers in other data sets.

Monarch Butterfly Danaus plexippus Male As he muses, One could now assign these to be sameAs and pull all this information together, or just pull in those parts that they think are appropriate.  In fact if you use SameAs.org to check the dbpedia URI (identifier) you will find over forty other [related identifiers].

He goes on to say something I have sympathy with “I think some groups are so concerned about keeping their walled garden and defending it from other walled gardens that they are missing the big picture.”  The question is why – all the sources show adherence to the Linked Data principles by linking to other resources outside of their garden, so are not totally closed in their thinking.  Is it up to the data publishers to deliver the global cloud of accessible interlinked data sets, or to just provide the data for someone else to aggregate and deliver it?

If you are a butterfly, and Linked Data, enthusiast you could pull data from the resources that Pete identifies into a single free triple store – download and run one from someone like OpenLink or get your own from a service such as Kasabi.com.  It would not take much work to merge the data graphs together, resulting in the ability to query for things such as videos of butterflies found in a particular area.

Most people are not Linked Data enthusiasts so might need a bit of help or a simple tool to do it for them.  A simple example of this is demonstrated by LinkSailor, a prototype dynamic link-follower from Ian Davis.  Still, you must really want to do this to make it worth the effort.  If the global cloud of interlinked data sets is going to become a reality, it needs to widen its scope of implementers beyond the motivated enthusiasts.

Unsurprisingly Google are up to something in this area.  On the surface it is all about delivering enhanced listing display in search results – Rich Snippets. To gain the prize of an enhanced listing, web site builders need to embed some structured data into their html in a format guided by schema.org.  As demonstrated by a plug-in for the popular open source content management system, Drupal, this could be done automatically.  The plug-in uses RDFa to embed RDF (the data format used to describe resources in Linked Data), in to pages, which Google and others can harvest.

Today, Google are apparently only using this data to enhance search results with things such as times for events, or preparation time for recipes.  Once the competitive race for rich snippet enhanced results kicks off, they are going to be amassing a significant amount of structured data about things on the web.  Link this harvested structured data with that of the Linked Data web described by Pete (much of which they already host in Freebase) and much of the work will be in place. Image and background to Google indexing RDFa from Manu Sporny.

We are already getting hints about these possibilities – for example Google’s Amit Singhal recently was quoted as saying Google is “building a huge, in-house understanding of what an entity is and a repository of what entities are in the world and what should you know about those entities,”.  If that is the case maybe as data publishers we can limit our responsibilities to sharing our data, with links to other data that is conceptually near by, using Linked Data principles – letting the Google’s and the WikiData’s of the world do the aggregating and wider linking for us.  I see this as the beginnings of a data driven wave of innovation which is approaching us fast – something I have previously explored in more depth.

So, it may not be very long before the click of a mouse in the BBC’s natural history unit, results in a new video of a butterfly’s flapping wing appearing on the local fauna page of your nearest national park.

Richard Wallis is Founder of Data Liberate.
Butterfly images from Wikimedia Commons and BBC.