If you missed last week’s excellent introduction to SPARQL by Bob DuCharme of TopQuadrant and the recently released Learning SPARQL, the recorded webcast is now available.  In this presentation, Bob shows how to create and run SPARQL queries. He also talks about the role that the query language can play in application development. Lastly, he looks at the range of uses people are finding for SPARQL above and beyond querying of RDF data, such as querying relational data, defining rules to enhance data quality, and more…

SPARQL Queries, SPARQL Technologies with Bob DuCharme - Watch the Webcast

Watch the webcast here:

http://mediabistro.adobeconnect.com/p8mwns7kdgx/

There were some questions we did not get to during the hour, and Bob has been kind enough to answer these offline.

BONUS Q&A with Bob DuCharme:

Q: Can sparql engines integrate reasoners and reason over the data on the fly?

A: As I mentioned on the webcast, it’s usually a case of reasoners integrating SPARQL engines (or an application like TopBraid integrating both). As to data on the fly, that’s also an application issue: a SPARQL engine queries the data that you feed to it, so the feeding part is up to the application.

Q: Did top braid do the the ET part? (ETL)?

A: TopBraid has built-in features to dynamically retrieve data from a variety of storage options (DBMSs, spreadsheets, web services) and, once the data is converted to triples, transform it with SPARQL queries. TopBraid’s SPARQLMotion scripting language makes it possible to chain a series of retrievals and transformations, with programming logic such as conditional branching and looping if necessary.

Q: How about shared databases/triplestores and conflicting access? – What synchronization events and constructs?

A: SPARQL 1.0 is only a query language, so conflicting access is not an issue there. SPARQL 1.1 does allow updating of data, and in a multi-user situation the implementation of ACID transactions would be a feature of the triplestore implementation. (In other words, SPARQL itself offers no commit/rollback features as part of the language.)

Q: Where can we find more information about the SPARQL rules?

A: On the W3c site: the SPIN Overview and Motivation document (“SPIN” is the syntax underlying SPARQL Rules) and the SPIN spec. SPIN also has its own home page. TopQuadrant’s Holger Knublauch has written a lot about it on his blog, and ProXML’s Paul Hermans has written some great pieces on it (part 1, part 2, part 3, part 4). I explained in more detail about finding the AGROVOC SKOS constrain violations on the TopQudrant blog at How to: Find SKOS constraint violations in AGROVOC with SPARQL Rules.

Q: Can you give the comparison about sparql rules like SPIN with the similar functions in OWL2?

A: At TopQuadrant, we have implemented OWL 2 RL using SPIN, if that gives you some idea of its power. It’s difficult for me to generalize about how SPIN compares with the other OWL profiles, but I can get you in touch with people at TopQuadrant with much deeper OWL background than my own.

Q: How much “bridging ontology” work is needed, in real life, to get value from the LDW sites in the graph you showed? Are they coordinated on predicates, entity URIs? If not, how do you see this being done at scale a SW tech and its adoption mature?

A: The bridging work is often done with SPARQL CONSTRUCT queries much like XSLT stylesheets drive XML transformations to coordinate aggregated XML. Having predicates and/or entity URIs between two datasets being merged is the easiest way to do it, but sometimes this takes manipulation of string versions of URIs to create this equivalency, e.g. to turn http://companyX.com/foobar and http://companyY.com/some/path/foobar into the same thing. The owl:sameAs predicate can also help here. The scaling is often where such applications move from open source implementations to commercial ones, where they can rely on more power being applied to the transformation.

Q: So, SPARQL doesn’t do any reasoning even using RDFS and not OWL?

A: SPARQL itself is a query language that asks for information from triples matching certain patterns. A SPARQL engine working with a reasoner can take inferred triples into account. For example, if Joe is a member of class Musician, and Musician is a subclass of Person, and we ask for all members of class Person, an inference engine can add in a triple for each Musician saying that he or she is also a member of class Person so that the query will return that person’s URI as well. SPARQL 1.1′s property paths do make it easier to build this kind of reasoning right into the query.

Q: Is SPARQL semantic query? if yes, why?

A: It depends what you mean by “semantic query.” I think that my previous answer best addresses how semantics can play a role in a query–for example, that a Musician is a type of Person.

Q: Can you recommend any combination of SPARQ + reasoner in one package?

A: Jena includes some OWL reasoning, and the Pellet reasoner can answer SPARQL queries. (I’ve written about how to do this from the command line with no Java coding here.) The Racer reasoner can answer SPARQL queries, but I’ve never played with that.

Q: What’s the current “correct” way of handling time change in RDF? (Temporal RDF)

A: There’s no specific best practice, but a Google search on rdf temporal leads to commercial, academic, and blog discussions of approaches. I’d start with Jeni Tennison’s discussion.

Meet Bob DuCharme in Washington DC November 29-December 1, 2011 for SemTechBiz DC.

Join Bob’s TopQuadrant colleague, David Price September 26-27 in London for SemTechBiz UK