Juan Sequeda photoThis year was the 5th version of the Linked Data on the Web Workshop co-located at the World Wide Web Conference going on in Lyon, France.

At this workshop, seven issues caught my attention:

1) Media: Yunja Li presented on Synote: Weaving Media Fragments and Linked Data. This is interesting for those who not only want to link to an entire video, but want to link to a part of a video at a specific interval of time, and also add metadata information about that.

2) NLP to Linked Data: How can we relate the results of different named entity extraction tools to Linked Data. Giuseppe Rizzo introduced their project, NERD, which is working on this area.

3) Provenance: Data cannot live alone; it needs provenance. We need to know where data come from, what level of trust do they have, in what context, etc. Jun Zhao gave a presentation on this topic and reported on the status of the W3C Provenance Working Group.

4) Federated Queries: A key topic in data integration is the ability to federate queries. Ziya Akar presented Wodqa, their work on querying different SPARQL endpoints using VOID description. Have a look for yourself.

5) Schema Matching: Another key topic in data integration is matching heterogeneous schemas. The work presented by Andreas Schultz reports on performance of different matching systems: R2R, Mosto and SPARQL 1.1 (yes, you can do schema matching with SPARQL 1.1! That deserves another blogpost)

6) User Interfaces: The organizers mentioned that this was a topic they were really hoping to have submissions for because UI on top of Linked Data is still an area where work has to be done. Igor Popov presented Mashpoint, a step forward in that area. Go check it out.

7) OWL in Linked Data: Axel Polleres presented experimental results of the real usage of OWL in Linked Data. They propose a Linked Data specific flavor of OWL: OWL-LD. Check it out here: http://semanticweb.org/OWLLD/

One of my personal highlights of the day was listening to Arnaud Le Hors from IBM talking about the real world usage of Linked Data in IBM’s solutions. It was very pleasing to hear that IBM is already using Linked Data to solve their customers’ problems. Their customers want to mix and match products from different vendors and even develop their own components. They do not only want data integration, they also want application integration.  Arnaud Le Hors stated that an ideal solution should be: distributed, scalable, reliable, extensible. simple and equitable. Very quickly they realized that they were describing the same features of the web! Instead of the program being in the center, they need to have the data in the center. The data needed to have a standard model, format and protocols to represent and access the data. That is the description of RDF, OWL,  SPARQL and the Linked Data principles. In his presentation, Arnaud stated that the linked data principles do not go far enough and several questions are open: How do I create a resource? Where can I get the list of resources that already exist? Which vocabulary should I use? Which media types do I use? When containers get big, how do I split the information into pages? How do I specify ordering? The great part of this talk is that they also proposed a solution: Linked Data Basic Profile proposal [Editor's Note: SemanticWeb.com is a co-sponsor of this proposal]. This proposal is the starting point of the W3C Linked Data Patterns Working Group Charter.

The other highlight of the day was the final panel discussion with Peter Mika (Yahoo!), Yves Raimond (BBC), Ivan Herman (W3C), and Tim Berners-Lee (W3C). The panel was on “Microdata, RDFa, Web APIs, Linked Data: Competing or Complementary?” Chris Bizer started out by presenting two impressive statistics on the deployment of Microformats, Microdata and RDFa. From the 1.4 billion HTML pages from the Common Crawl corpus, 13% of all HTML pages contain structured data. From the 3.2 billion HTML pages from the Bing Crawl corpus reported by Yahoo! Research, 30% of all HTML pages contain structured data. More information on these statistics can be found in this presentation. Yves Raimond stated that RDFa and Microdata was only being used by the search engines. Peter Mika stated that we are underestimating the importance of search engine support for structured web data; big ad agencies are starting to do it too. It was suggested that data publishers should also use schema.org as a vocabulary to describe their data and not only to annotate their webpages with RDFa and Microdata. Ivan Herman stated that two years ago, the community didn’t buy the idea of structured data on the web, but that has changed. He coined the “Jay Myers effect”, where you have somebody who understands the technology and is able to make things happen inside a big organization, just like what Jay Myers did in Best Buy. Eric Wilde tweeted “… there should be many more like Jay.” Herman also stated that BestBuy and Talis are good examples of serendipitous reuse. For example, thanks to RDFa, your website becomes your API. Additionally, Herman stated that the community is good at publishing data, but we need more users. The conversation steered towards JSON-LD, which Ivan Herman strongly advocates. However, Tim Berners-Lee warned that JSON-LD may do as much harm as RDF/XML, if used wrongly. TimBL stated that RDF/XML tried to hide RDF for the XML community and was worried that the same could happen to the JSON community. Are we repeating the same mistake? TimBL concluded with “Read/Write Linked Data is where it’s at.”