Juan Sequeda photoThis year was the 21st World Wide Web Conference located in Lyon, France. This conference is a unique forum for discussion about how the Web is evolving. There were hundreds of talks over 3 days. Let me summarize some Semantic Web presentations I was able to attend.

NautiLOD

Programmers daily use the wget tool to specify and retrieve data on the Web. However, wget is limited since it cannot dig into the semantics of Web data to do the job. What if you were to add semantics to wget? This is the question that Valeria Fionda, Claudio Gutierrez and Giuseppe Pirró asked themselves. They took that question to the next level: imagine a semantic wget on top of Linked Data. They wanted to create a language to declaratively specify portions of the Web of Data, define routes and instruct agents that can do things for you on the Web. All this by exploiting the semantics of information (RDF data) found in online data sources. For example, find all the Wikipedia pages of directors that have been influenced by Stanley Kubrick and send them to my email; retrieving information about David Lynch from different information providers only gives a hint of what can be done. The researchers developed a simple, generic declarative language, NautiLOD and implemented it in swget (semantic wget). swget comes in two flavors: a simple command line tool (to give the Web back to users) and a GUI. This is not a fantasy anymore. Check it our for yourself (http://swget.wordpress.com).

SPARQL 1.1 Yottabyte

Chilean scientists, Marcelo Arenas, Sebastian Conca and Jorge Perez have been actively studying the new SPARQL 1.1 property path proposal and discovered an important flaw in the W3C specification. They presented a paper title “Counting Beyond a Yottabyte or How… “. With property paths in SPARQL, it would be possible to figure out, for example, who are all the friends of my friends of my friends…. of my friends. In their paper, they proved that any SPARQL property path implementation would have to generate 79 yottabytes of data in order to run such a query on an RDF file of only 14 kilobyte. To put things in context, if you would put all the data in the digital world together, it would be less than 1 Yottabyte. Even if P = NP, this would still be a hard problem. However, do not fear! The scientists offer a solution, which has been accepted by the W3C SPARQL Working group. It was not a surprise when this paper received the Best Paper award at the WWW Conference. It is important to note that Marcelo Arenas, Jorge Perez and Claudio Gutierrez were the scientists who came up with the formal semantics of SPARQL that was adopted by the W3C and implemented in every RDF database that supports the SPARQL standard.

Question Answering on SPARQL endpoints

Imagine writing natural language questions on top of a SPARQL endpoint. For example, “Which cities have more than three universities?” Large steps toward achieving this objective have been taken, thanks to the work of Christina Unger et al. They presented AutoSPARQL, which lets you write natural language questions on top of DBpedia.  Check it out for yourself: http://autosparql-tbsl.dl-learner.org/.

RDB2RDF

Juan Sequeda, Marcelo Arenas and Daniel Miranker presented a study on how to directly map relational databases to RDF and OWL, especially relational databases that may contain NULL values, which is common in practice. The current W3C Direct Mapping specification does not have a clear understanding on NULLs. This work presents a new Direct Mapping which is an extension of the W3C’s Direct Mapping, which can handle NULLs and is proven to be information- and query-preserving. In other words, if you directly map your relational databases to RDF, no information will be lost, and all your SQL queries will be able to be expressed in SPARQL. Before this work, there was no guarantee of this. This Direct Mapping, including the W3C’s Direct Mapping and R2RML has been implemented in Ultrawrap, and RDB2RDF tool provided by Capsenta.

Linked Enterprise Data Panel

The Linked Enterprise Data panel was moderated by Christian Fauré. The panelists were Cornelia Davis (EMC), Jean-Louis Vila (Sword), Fabrice Lacroix (Antidot) and Juan Sequeda (Capsenta/UT Austin). The discussion was centered around 3 questions: What is Linked Enterprise Data (LED)? What benefits does LED have? What is missing in order to go forward? It was clear for everybody that Linked Enterprise Data is Linked Data behind the firewall inside of an enterprise’s intranet. A key point agreed by everybody was that adopting LED focuses on the data and not so much on the ontology. It would be impossible to generate one ontology for the entire workflow of a company. The panelists mentioned the new W3C Linked Data Patterns Working Group Charter and IBM’s Linked Data Basic Profile submission, which is an important step needed for adoption of Linked Data in the enterprise. The main benefits of LED is the ability to create links between different data silos and create a unique resource for every entity. However, Linked Data does not imply using RDF. EMC is using odata. Everybody agreed that with Linked Data, you can do things faster with fewer resources. This has also been stated by Lee Feigenbaum (Cambridge Semantics) and Kendall Clark (Clark & Parsia) as a key benefit of adopting semantic technologies. However, there are still several things missing in order to go forward. There is not a convincing answer to: What can I do with RDF that I can’t do with XML? Additionally, we need more tools that are easier for developers and have low risk.

The World Wide Web conference was a great place to learn about the current research results and where the web is heading. Jim Hendler stated it nicely: “… feels like end of Web 2.0 is freeing things up!“. Next year’s WWW Conference will be in Rio de Janeiro.