Empire is an implementation of the Java Persistence API (JPA) for RDF and the Semantic Web. Instead of another implementation of relational databases, Empire implements JPA for RDF and SPARQL, thus allowing developers who are familiar with JPA, but not with semantic web techologies like RDF, to make an easy transition into this brave, new world. JPA is a specification for managing Java objects, most commonly with an RDBMS; it’s industry standard for Java ORMs.

Motivations for Empire

We started Empire—which is available under the terms of Apache 2.0 open source license—to bridge the gap between an RDBS-backed web application and the Semantic Web. We built a web application for a customer which used JPA & Hibernate, but we also wanted to provide a SPARQL endpoint so that we could use Pelorus, a faceted browser for RDF and SPARQL. Ideally, we wanted to use a JPA implementation which would operate against an RDF database in support of these requirements. The objective of this article is to walk through some basic uses of Empire to illustrate how it can be used in your application. For the purposes of the article, we’ll present some examples from an application which uses metadata about various O’Reilly books.

Persistence with plain RDF

O’Reilly has recently started publishing their catalog pages with RDFa markup as mentioned here. For example, if you checkout the page for "Switching to the Mac" you’d find this RDF embedded in the page:

<rdf:Description rdf:about="urn:x-domain:oreilly.com:product:9780596514129.IP">
    <foaf1:primarySubjectOf rdf:resource="http://oreilly.com/catalog/9780596514129"/>
    <dc1:title xml:lang="en">Switching to the Mac: The Missing Manual, Leopard Edition</dc1:title>
    <dc1:title>Switching to the Mac: The Missing Manual, Leopard Edition</dc1:title>
    <dc1:creator rdf:resource="urn:x-domain:oreilly.com:agent:pdb:350"/>
    <dc1:publisher xml:lang="en">O'Reilly Media / Pogue Press</dc1:publisher>
    <dc1:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2008-02-26</dc1:issued>
    <frbr1:embodiment rdf:resource="urn:x-domain:oreilly.com:product:9780596514129.BOOK"/>
    <frbr1:embodiment rdf:resource="urn:x-domain:oreilly.com:product:9780596802899.EBOOK"/>
    <frbr1:embodiment rdf:resource="urn:x-domain:oreilly.com:product:9780596514129.SAF"/>
    <rdf:type rdf:resource="http://vocab.org/frbr/core#Expression"/>
</rdf:Description>

If you were to create this data by hand using the Sesame API, it’s going to look something like this:

Graph aGraph = new GraphImpl();
URI aBook = aGraph.getValueFactory().createURI("urn:x-domain:oreilly.com:product:9780596514129.IP");
aGraph.add(aBook,
           aGraph.getValueFactory().createURI("http://purl.org/dc/terms/publisher"),
           aGraph.getValueFactory().createLiteral("O'Reilly Media / Pogue Press"));
aGraph.add(aBook,
           aGraph.getValueFactory().createURI("http://purl.org/dc/terms/title"),
           aGraph.getValueFactory().createLiteral("Switching to the Mac: The Missing Manual, Leopard Edition"));
  
// ... setting the additional properties here ...
  
aGraph.add(aBook,
           RDF.TYPE,
           aGraph.getValueFactory().createURI(http://vocab.org/frbr/core#Expression));

You might have factory classes or constants to represent common concepts such as terms from the FOAF or DC ontologies; but for the most part, creating RDF data is going to look quite similar to this. While this is a perfectly functional example, you might find a couple issues with it. First, this code does not look "natural" — that is, it does not represent what is actually going on in an easily discernible way. It doesn’t really look like we’re creating some data about a book in the O’Reilly catalog. It also has locked us into a particular RDF API; this is Sesame code. It’s a non-trivial task to transition this code to another API. Third, the code is only going to make sense to someone who is familiar with RDF; it exposes a lot of RDF minutiae to the developer, which is only going to increase the learning curve for new developers.

What we want are simple Java beans to represent concepts in our application; that application code is easier to create and maintain and does not leak RDF specifics into the codebase nor does it tie us to any particular RDF API.

Consider the following example:

Book aBook = new Book();
aBook.setTitle("Switching to the Mac: The Missing Manual, Leopard Edition");
aBook.setPublisher("O'Reilly Media / Pogue Press");
aBook.setIssueDate("2008-02-26");
// And so on...

This code is much easier to work with; it’s more clear in what it’s trying to accomplish, it succinctly represents our domain, does not tie us to any API other than our own, and exposes no RDF details to the programmer. Nearly any developer, Java or otherwise, could look at this code and immediately understand what’s going on. Obviously using Plain Old Java Objects (POJOs) is ideal, but that is only half of the challenge. We still need to save, remove and search for our data, and we want it represented as RDF. This is where Empire comes in.

Persistence With Empire

If you’ve used a JPA implementation before, a lot of the following code should be very familiar to you. Mappings between a Java bean and an RDBMS are often controlled through the common annotations provided by JPA. You typically begin by declaring that your bean is a JPA entity:

@Entity
public class Book

Empire simply extends this approach by adding an additional annotation to the class to specify its type:

@Namespaces({"frbr", "http://vocab.org/frbr/core#"})
@Entity
@RdfsClass("frbr:Expression")
public class Book

We’ve now mapped instances of the Book class to individuals of the frb:Expression class. You’ll notice an additional optional annotation, @Namespaces, on the class where we specify namespaces that we’ll use throughout our markup; this allows us to use qnames instead of full URIs. We need to make one last change before we can start mapping the properties of the class to RDF: we need to assert that this book can have an RDF identifier:

// ... annotations ...
public class Book implements SupportsRdfId

In Empire it’s easier to work with named individuals than anonymous ones; but Empire supports both and provides builtin handlers for keeping them straight. You never have to worry about setting or creating ids. Now we need to map the properties of our Java bean to the properties of our instances of the Book in our database. Typically, using Hibernate, Toplink or another JPA implementation, standard properties are very easy, you just declare them:

private String title;
private String publisher;
private Date issued;

These three fields will get persisted in the database when you save your Book object. If you have a collection of items, you’ll just need to specify some basics of the mapping:

@OneToMany(fetch = FetchType.LAZY,
           cascade = {CascadeType.PERSIST, CascadeType.MERGE})
private Collection<Manifestation> mEmbodiments = new HashSet<Manifestation>();

Empire only requires a little bit more information; namely, it needs to know what property each field in your bean corresponds to:

@RdfProperty("dc:title")
private String title;
  
@RdfProperty("dc:publisher")
private String publisher;
  
@RdfProperty("dc:issued")
private Date issued;
  
@RdfProperty("frbr:embodiment")
@OneToMany(fetch = FetchType.LAZY, 
           cascade = {CascadeType.PERSIST, CascadeType.MERGE})
private Collection<Manifestation> mEmbodiments = new HashSet<Manifestation>();

With these simple additional annotations, the Java bean can now be used with Empire.

Using Empire

Initializing Empire is trivial, you simply need to declare which API bindings you’d like to load. The following example shows how to load the support for Sesame, which allows Empire to connect to Sesame repositories. You can load multiple bindings at once and have different persistence contexts connected to databases of different types, while still maintaining the same public API:

Empire.init(new OpenRdfEmpireModule());

Here we use the standard JPA framework to grab an instance of our persistence context named ‘oreilly’. The resulting EntityManager will be connected to the Sesame repository specified in our configuration:

EntityManager aManager = Persistence.createEntityManagerFactory("oreilly")
                                    .createEntityManager();

The following shows how to retrieve a specific item, in this case a book, from the database and print some of its data:

Book aBook = aManager.find(Book.class, URI.create("urn:x-domain:oreilly.com:product:9780596514129.IP"));
  
// prints: Switching to the Mac: The Missing Manual, Leopard Edition
System.err.println(aBook.getTitle());
  
// prints: O'Reilly Media / Pogue Press
System.err.println(aBook.getPublisher());

This shows how to create a new Book and save it to the database:

Book aNewBook = new Book();
aNewBook.setIssued(new Date());
aNewBook.setTitle("How to Use Empire");
aNewBook.setPublisher("Clark & Parsia");
aNewBook.setPrimarySubjectOf(URI.create("http://github.com/clarkparsia/Empire"));
  
// grab the ebook manifestation
Manifestation aEBook = aManager.find(Manifestation.class, URI.create("urn:x-domain:oreilly.com:product:9780596104306.EBOOK"));
  
// and we'll use it as the embodiment of our new book.
aNewBook.setEmbodiments(Arrays.asList(aEBook));
  
// save the new book to the database
aManager.persist(aNewBook);

Here we show that finding the same object in the database yields an instance which is .equals() to our original copy:

Book aNewBookCopy = aManager.find(Book.class, aNewBook.getRdfId());
  
// true!
System.err.println(aNewBook.equals(aNewBookCopy));

Additionally, we then make some edits to our original and save them back into the database. Our copy remains unchanged and is a snapshot of the state of the book at the time we retrieved it. This also shows how attributes on the JPA annotations can control the persistence behavior; in this case, how persistence is cascaded between objects:

// let's edit our book...maybe we changed the title and published as a PDF
aNewBook.setTitle("Return of the Empire");
  
// create a new manifestation
Manifestation aPDFManifestation = new Manifestation();
aPDFManifestation.setIssued(new Date());
// set the dc:type attribute
aPDFManifestation.setType(URI.create("http://purl.oreilly.com/product-types/PDF"));
          
aNewBook.setEmbodiments(Arrays.asList(aPDFManifestation));
          
// now save our edits
aManager.merge(aNewBook);
          
// print the new information we just saved
System.err.println(aNewBook.getTitle());
System.err.println(aNewBook.getEmbodiments());
          
// and importantly, verify that the new manifestation was also saved due to the cascaded merge operation
// specified in the Book class via the @OneToMany annotation
          
// true!
System.err.println(aManager.contains(aPDFManifestation));
          
// the copy of the book contains the old information
System.err.println(aNewBookCopy.getTitle());
System.err.println(aNewBookCopy.getEmbodiments());

We can always refresh a "stale" object with the latest data from the database.

// but can be refreshed...
aManager.refresh(aNewBookCopy);
         
// and now contains the correct, up-to-date information
System.err.println(aNewBookCopy.getTitle());
System.err.println(aNewBookCopy.getEmbodiments());

Here is an example of removing an object from the database; it again demonstrates how persistence operations are controlled through the JPA annotations:

// now we can delete our new book
aManager.remove(aNewBook);
          
// false!
System.err.println(aManager.contains(aNewBook));
          
// but the new manifestation still exists, since we did not specify that deletes should cascade...
          
// true!
System.err.println(aManager.contains(aPDFManifestation));

A final example demonstrates how standard JPA parameterized queries can be used with normal SPARQL to query the database:

// Lastly, we can use the query API to run arbitrary sparql queries
// create a jpql-style partial SPARQL query (JPQL is currently unsupported)
Query aQuery = aManager.createQuery("where { ?result frbr:embodiment ?manifest." +
                                    "         ?foo <http://purl.org/goodrelations/v1#typeOfGood> ?manifest . " +
                                    "        ?foo <http://purl.org/goodrelations/v1#hasPriceSpecification> ?price. " +
                                    "        ?price <http://purl.org/goodrelations/v1#hasCurrencyValue> ?value. " +
                                    "        ?price <http://purl.org/goodrelations/v1#hasCurrency> \"USD\"@en." +
                                    "        filter(?value > ??min). }");
  
// this query should return instances of type Book
aQuery.setHint(RdfQuery.HINT_ENTITY_CLASS, Book.class);
  
// set the parameter in the query to the value for the min price
// parameters are prefixed with ?? -- this differs slightly from JPQL
aQuery.setParameter("min", 30);
  
// now execute the query to get the list of all books which are $30 USD
List aResults = aQuery.getResultList();
  
// 233 results
System.err.println("Num Results:  " + aResults.size());
  
// print the titles of the first five results
for (int i = 0; i < 5; i++) {
    Book aBookResult = (Book) aResults.get(i);
    System.err.println(aBookResult.getTitle());
}

Features and Support

Empire implements as much of JPA as possible while attempting to retain the expected behavior based on the JPA spec. There are features and portions of JPA that Empire does not yet support, such as @SqlResultSetMapping; and some others that have no correlation to an RDF based system, such as @Table or @Column.

Configuration of Empire is controlled through simple properties or XML format files loaded at startup. There is no tricky XML mapping language to learn, all mappings are controlled through the standard JPA annotations. The configuration files simply define the connection parameters for your database as well as allow for global properties to be used by all databases.

Empire uses Dependency Injection via Google Guice to manage its plugin architecture and Javassist for bytecode manipulation; generating instances from interfaces or abstract classes at runtime and lazy loading of resources from the database using method interceptors. This allows Empire to provide an API agnostic mechanism for working with RDF databases, thus avoiding API and/or database lock-in. Empire provides out of the box support for Sesame, Jena, 4Store a future version will add support for Mulgara, BigData, Oracle 11g and Virutoso.

Conclusion

Empire provides a standard, widely-known Java persistence framework for use in Semantic Web projects where data is stored in RDF. By providing an implementation of JPA and using it to abstract the minutiae of RDF, it lowers the learning curve for new developers, and helps provide a straightforward path for migrating or enhancing existing traditional web applications to use semantic technologies.

Code Examples

All code examples used in this article can be found in the public Empire source repository on GitHub. Empire support can be found on the mailing list.