Juan Sequeda photoI’m very happy to announce that the World Wide Web Consortium’s RDB2RDF Working Group, in which I participate as an Invited Expert,  has published two Candidate Recommendations: R2RML: RDB to RDF Mapping Language and A Direct Mapping of Relational Data to RDF. This has been a long road and we still have some ways to go. The standardization process goes back to the W3C Workshop on RDF Access to Relational Databases, which took place in October 2007. The W3C RDB2RDF Incubator Group followed afterwards. After almost 5 years, we are on track to have a standard. However, what is this standard bringing to the table?

Data on the Web

The Semantic Web is about publishing structured data on the web, interlinking the data and providing new and innovated mechanisms to search, find and discover information. One way of publishing structured data on the web is through RDFa on your HTML page and following guides such as schema.org. Another way could be using the new RDB2RDF standards! Given structured data on the web, search engines can now present rich snippets, instead of a plain blue link, or offer faceted browsing such as Google’s Recipe Search or the DBpedia Faceted Browser. Semantic search engines such as Sindice can even let you run SPARQL queries on vast amounts of data.

Data Integration

Semantic Web technologies can also be applied in the enterprise to integrate data residing in different sources. And where do you think that the majority of this data is stored? Relational Databases! The flexibility of the RDF graph data model makes it very easy to integrate data. Just imagine you have two different databases that are exposed as different RDF graphs and you want to combine them into one RDF graph. What do you do? You can simply merge nodes or create links between the nodes. Data warehousing or federation systems can then let you query all the data together. Data integration systems using semantic web technologies is what the DoD is looking into in order to create an Enterprise Information Web.

R2RML and Direct Mapping

R2RML and Direct Mapping are the two important pieces to the puzzle. Direct Mapping is a default mapping that automatically generates RDF from the relational data, with the push of a button. R2RML is a mapping language where a user can customize which relational tables and columns get mapped to RDF using a specific vocabulary/ontology. Direct Mapping and R2RML complement each other. As a first step, a user may want to run the Direct Mapping first to see what the RDF looks like and have a pre-populated R2RML file. Afterwards, the user can customize the R2RML. Consider the following SQL-DDL

CREATE TABLE Employee (
id int PRIMARY KEY,
name VARCHAR(100));

INSERT INTO Employee (id, name) VALUES (1, 'Juan Sequeda');

The RDF that is generated automatically through the Direct Mapping would be the following:

<http://ex.com/Employee/id-1>
     <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
     <http://ex.com/Employee>.
<http://ex.com/Employee/id-1>
     <http://ex.com/Employee#id>
     "1"^^<http://www.w3.org/2001/XMLSchema#integer>.
<http://ex.com/Employee/id-1>
     <http://ex.com/Employee#name>
     "Juan Sequeda".

Not bad, if all of that came out with a push of a button. But what happens if I want to customize the URIs and use existing vocabularies? This is where R2RML comes in. The following R2RML:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ex: <http://ex.com/onto/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<TriplesMap1>
  a rr:TriplesMap;
  rr:logicalTable [ rr:tableName "Employee" ];
  rr:subjectMap [ rr:template "http://mycompany.com/Employee/{id}";
                          rr:class ex:Employee; ];
  rr:predicateObjectMap [ rr:predicateMap [ rr:constant ex:id ];
                          rr:objectMap    [ rr:column "id" ] ];
  rr:predicateObjectMap [ rr:predicateMap [ rr:constant foaf:name ];
                          rr:objectMap    [ rr:column "Name" ] ] .

would generate the following RDF:

<http://mycompany.com/Employee/1>
     <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
     <http://ex.com/onto/Employee>.
<http://mycompany.com/Employee/1>
     <http://ex.com/onto/id>
     "1"^^<http://www.w3.org/2001/XMLSchema#integer>.
<http://mycompany.com/Employee/1>
     <http://xmlns.com/foaf/0.1/name>
     "Juan Sequeda".

What’s next?

If you are interested in learning more about Direct Mapping and R2RML, don’t miss out on several presentations at SemTechBiz at San Francisco in June which will include a tutorial on SPARQL Access to SQL Databases, a panel on Implementations of R2RML and a presentation on RDB2RDF: SPARQL as fast as SQL.

The full program is available here, and registration here.

At this stage, the W3C is inviting implementations of R2RML and Direct Mapping and to participate in the Test Cases. Instructions on how to submit implementation reports can be found on the W3C RDB2RDF Wiki. Comments and test case reports are welcome through 30 April 2012. Per successful implementations, the standards are off to Proposed Recommendation and then finally a W3C Recommendation.