Richard Wallis has written an article about the latest updates to WorldCat.org. He writes, “After we experimentally added RDFa embedded linked data, using Schema.org markup and some proposed Library extensions, to WorldCat pages, one the most often questions I was asked was where can I get my hands on some of this raw data? We are taking the application of linked data to WorldCat one step at a time so that we can learn from how people use and comment on it. So at that time if you wanted to see the raw data the only way was to use a tool [such as the W3C RDFA 1.1 Distiller] to parse the data out of the pages, just as the search engines do.”

He goes on, “So I am really pleased to announce that you can now download a significant chunk of that data as RDF triples. Especially in experimental form, providing the whole lot as a download would have bit of a challenge, even just in disk space and bandwidth terms.  So which chunk to choose was a question. We could have chosen a random selection, but decided instead to pick the most popular, in terms of holdings, resources in WorldCat – an interesting selection in its own right. To make the cut, a resource had to be held by more than 250 libraries.  It turns out that almost 1.2 million fall in to this category, so a sizeable chunk indeed.   To get your hands on this data, download the 1Gb gzipped file. It is in RDF n-triples form, so you can take a look at the raw data in the file itself.  Better still, download and install a triplestore [such as 4Store], load up the approximately 80 million triples and practice some SPARQL on them.”

Read more here.

Image: Courtesy WorldCat