Hello, my name is SPARQL
SPARQL is the standardized query language for RDF, the same way SQL is the standardized query language for relational databases. If this is the first time you look at SPARQL, but you’re familiar with SQL, you will see some similarities because it shares several keywords such as SELECTWHERE, etc. It also has new keywords that you have never seen if you come from a SQL world such as OPTIONALFILTER and much more.

Recall that RDF is a triple comprised of a subject, predicate and object. A SPARQL query consists of a set of triples where the subject, predicate and/or object can consist of variables. The idea is to match the triples in the SPARQL query with the existing RDF triples and find solutions to the variables. A SPARQL query is executed on a RDF dataset, which can be a native RDF database, or on a Relational Database to RDF (RDB2RDF) system, such as Ultrawrap.  These databases have SPARQL endpoints which accept queries and return results via HTTP.

A basic example

Assume we have the following RDF triples in our database

:id1 foaf:name "Juan Sequeda"
:id1 foaf: based_near :Austin
:id2 foaf:name "Bob"
:id2 foaf:based_near : Dallas

And we want to find the names of all the people in our database. This SPARQL query would look like:

SELECT ?name
WHERE {
?x foaf:name ?name
}

Let’s break down this query from the beginning. The query starts with the keyword SELECT and afterwards are the variable names that we would like to project, which in this case is ?name. Note that all variable names have a question mark in the beginning.  Afterwards, we find the WHERE keyword which is followed by a triple between curly braces. This triple is the most interesting part. The triple in the query must also consists of a subject, predicate and an object but in this case, either one can be a variable. In this case, the subject and the object are variables while the predicate is a constant value. This triple in the query is evaluated against all the RDF triples in your database. Constant values in the query triples are matched with constant values of the RDF triples in your database. For example, in our query triple, the only constant value is in the predicate which is foaf:name. Out of our four RDF triples, two of them have foaf:name as a constant in the predicate, therefore these two RDF triples match our query triples. Therefore we have two solutions:

1.

?x = id1, ?name = "Juan Sequeda"

2.

?x = id2, ?name = "Bob"

Because our query is only selecting the values assigned to the variable ?name, the final answer is Juan Sequeda and Bob

Another example

Now let’s complicate ourselves a little bit more. Assume we want to find the names of people who are based near Austin. The SPARQL query would be

SELECT ?name
WHERE {
?x foaf:name ?name .
?x foaf:based_near :Austin .
}

In this query we have two triples. The first one is the same as our previous example and we already know the solution to it. Now lets look at the second query triple. In this case, the predicate has a constant value of foaf:based_near and the object has a constant value of :Austin which can only match to one of our RDF triples. The solution is:

3.

?x = id1

Now each of the triples of our queries has it’s own solution. As you can see, both of these triple queries share a variable: ?x. This means that both this solutions can be joined. Therefore, the final solution is only

?x = id1, ?name = "Juan Sequeda"

SPARQL 1.0

SPARQL 1.0 is the first version of SPARQL which was standardized in January 2008. It only allows you to query a RDF database and it does not allow you to insert or update the database. Some interesting features:

Result Syntaxes

The results of SPARQL queries can be expressed in different formats. There is a standard SPARQL Query Result XML format, or in JSON. The result of a CONSTRUCT query is always an RDF graph, which can be serialized in all the corresponding RDF syntaxes (RDF/XML, N-triples, etc).

Query for Relationships

If you have a triple pattern in a query where the predicate is a variable, then you can explore the database to find relationships. For example, the query

SELECT ?p
WHERE {
:John ?p :Mary
}

returns the type of link between :John and :Mary. That’s not possible in SQL :)

Transform Data with CONSTRUCT

Through the CONSTRUCT operator, which is an alternative to SELECT, SPARQL allows you to transform data. The result is an RDF graph, instead of a table of results. Imagine you have RDF data that has been automatically generated and you would like to transform it to use well-known vocabularies. For example:

PREFIX foaf: <a href="https://docs.google.com/document/pub?id=1fiCI6B9R35KrPesxNVyutsST0YlO_0djVhhApI-sdwg">&lt;http://xmlns.com/foaf/0.1/&gt;
</a>PREFIX ex: &lt;http://myexample.com/&gt;
CONSTRUCT {
?x foaf:name ?name
}
WHERE {
?x ex:nombre ?name .
}

OPTIONAL

An interesting operator in SPARQL is OPTIONAL. If you are coming from the SQL world, this operator is equivalent to the LEFT OUTER JOIN. The question is, why do we need this? Consider the following RDF triples:

:id1 foaf:name "Juan Sequeda"
:id1 foaf: based_near :Austin
:id2 foaf:name "Bob"

and the following query:

SELECT ?name ?loc
WHERE {
?x foaf:name ?name .
?x foaf:based_near ?loc .
}

If you are coming from the SQL world, you would expect two results: {?name = “Juan Sequeda”, ?loc = :Austin} and {?name = “Bob”, ?loc = null}. However, there is no triple with subject :id2 and predicate foaf:based_near, therefore there is nothing to join on. Additionally, there are no nulls in RDF so you can’t explicitly say that Bob has a location which is null, Therefore, this solution is not possible. The actual answer is just  {?name = “Juan Sequeda”, ?loc = :Austin} . So how do I get the previous results? This is where OPTIONAL comes in. The query would have to be:

SELECT ?name ?loc
WHERE {
?x foaf:name ?name .
OPTIONAL {?x foaf:based_near ?loc .}
}

This query can be read as: “find all the names, and oh by the way, if there is a foaf:based_near attached, return that too, otherwise, don’t worry about it”. The actual solution would be {?name = “Juan Sequeda”, ?loc = :Austin} and {?name = “Bob”}.

Negation

Negation in SPARQL 1.0 is… weird. It is based on Negation as Failure and it’s implemented using OPTIONAL, the bound filter, and the logical-not operator. The OPTIONAL operator binds variables to the triples that we want to exclude, and the filter removes those cases. For example, find people who don’t have a location. Following our previous example dataset, the query would be:

SELECT ?name
WHERE {
?x foaf:name ?name .
OPTIONAL {?x foaf:based_near ?loc .}
FILTER(!bound(?loc))
}

and the result is

{?x = “Bob”}

SPARQL 1.1

There are many features that are missing in SPARQL 1.0 and this is where SPARQL 1.1 comes in, which was chartered in 2009. Some of the main features that are missing (in my opinion) are aggregates, sub-queries, and a natural negation operator. Thankfully, they are being added in SPARQL 1.1, together with more interesting features.

  • Aggregates: ability to group results and calculate aggregate values (e.g. count, min, max, avg, sum, …).
  • Sub-queries: allows a query to be embedded within another.
  • Negation: includes two negation operators: NOT EXIST and MINUS
  • Property paths: query arbitrary length paths of a graph via a regular-expression-like syntax
  • Query Federation: ability to split a single query and send parts of it to different SPARQL endpoints and then combining the results from each one
  • Projected expressions: ability for query results to contain values derived from constants, function calls, or other expressions in the SELECT list.
  • Update: an update language for RDF
  • Service Description: a vocabulary and discovery mechanism that describes the capabilities of a SPARQL endpoint.
  • Entailment Regimes: defines conditions under which SPARQL queries can be used for inference under RDF, RDF Schema, OWL, or RIF entailment.

Conclusions

This is a quick introduction to SPARQL and hopefully your are hungry for more. To learn more about SPARQL, check out:

Finally, if you also check out the current standards draft by the W3C SPARQL Working Group.

Happy SPARQLing!