When people think about orchestration efforts, they tend to think about centralized, Enterprise Service Bus (ESB)-based efforts. The service elements are published into reusable components that can be stitched together into workflows. This vision of Service-Oriented Architectures (SOA) allows central metrics of use and stability, but it precludes a common use case familiar to Unix users.

The "pipes and filters" model allowed anyone to compose small, well-defined, well-tested tools into a local, efficient workflow that requires no special blessing. There is no publication step per se. With a common interface, the output of one step is the input to the next. If you need a custom tool or script, it is easy to add one (or a collection of them) with ease. This isn’t to say that the ESBs of the world don’t allow pipes and filter models, but they tend to occur "out there" and do require a publication step.

If you are experimenting, have a one-off filter or transformation, don’t care to push things back through central services, have a heavy-handed governance review process, or just want to run your workflows on your own systems and topologies, it is nice to have the opportunity to do so.

Dave Beckett recently highlighted how the combination of Semantic Web and Linked Data with command line interfaces is a compelling combination. Presented at SXSW, his talk shows off the power of the open source tools he has created and shepherded over the past decade.

There are three main tools at play:

  • Raptor (rapper) for handling low-level parsing and serialization
  • Rasqal (roqet) for handling SPARQL queries
  • Redland (rdfproc) for storage and graph manipulation

There are also bindings for just about any language you might like to use to write custom tools on top of the more general ones.

Things become really interesting when we combine logical, resolvable references to information we care about. RDF provides a common data model to allow information to be merged from a variety of sources (documents, databases, services, native triplestores). Add in the pipes and filters topology with all of our existing (or new) reusable tools and we have quite a nice environment for quick and dirty workflows.

If you install the Redland tools (I did ‘sudo port install redland’ on Snow Leopard), you should be able to run the following examples.

RDFa is finding its way into more and more documents. To extract the content from an existing document:

rapper -g -q http://oreilly.com/catalog/9780596801687/

<http://oreilly.com/catalog/9780596801687/> <http://www.w3.org/1999/xhtml/vocab#stylesheet> 
   <http://oreilly.com/catalog/assets/catalog_page.css?2> .
<http://oreilly.com/catalog/9780596801687/> <http://www.w3.org/1999/xhtml/vocab#stylesheet> 
   <http://oreilly.com/catalog/assets/jquery.lightbox-0.5.css> .
<http://oreilly.com/catalog/9780596801687/> <http://www.w3.org/1999/xhtml/vocab#stylesheet> 
   <http://t.p.mybuys.com/css/mbstyles.css> .
<https://epoch.oreilly.com/shop/cart.orm?prod=9780596801687.BOOK> 
   <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
   <http://purl.org/goodrelations/v1#Offering> .
_:bnode1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
   <http://purl.org/goodrelations/v1#TypeAndQuantityNode> .
_:bnode1 <http://purl.org/goodrelations/v1#amountOfThisGood> 
   "1"^^<http://www.w3.org/2001/XMLSchema#float> .
_:bnode1 <http://purl.org/goodrelations/v1#typeOfGood> 
   <urn:x-domain:oreilly.com:product:9780596801687.BOOK> .
<https://epoch.oreilly.com/shop/cart.orm?prod=9780596801687.BOOK> 
   <http://purl.org/goodrelations/v1#includesObject> _:bnode1 .
<https://epoch.oreilly.com/shop/cart.orm?prod=9780596801687.BOOK> 
   <http://www.w3.org/2000/01/rdf-schema#label> "        Print" .
_:bnode3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
   <http://purl.org/goodrelations/v1#UnitPriceSpecification> .
_:bnode3 <http://purl.org/goodrelations/v1#hasCurrency> "USD" .
_:bnode3 <http://purl.org/goodrelations/v1#hasCurrencyValue> 
   "39.99"^^<http://www.w3.org/2001/XMLSchema#float> .
.
.
.
 

or from the RDF extracted from a document through a service like RDFa Distiller (tinyurl’ed for publication) and converted to the Turtle format:

rapper -g -o turtle -q http://tinyurl.com/y9pvaye

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .

<http://www.flickr.com/photos/danja/>
    foaf:name "danja." .

<http://www.flickr.com/photos/danja/1767174>
    dc:title "Amazon for cats #2" ;
    xhv:icon <http://www.flickr.com/favicon.ico> ;
    xhv:stylesheet <http://l.yimg.com/g/css/c_bo_selecta.css.v80386.14>, 
       <http://l.yimg.com/g/css/c_fold_main.css.v86587.64777.80377.14>, 
       <http://l.yimg.com/g/css/c_fold_photo.css.v84694.80992.69785.64777.14>, 
       <http://l.yimg.com/g/css/c_photos_people.css.v80760.14> .

<http://www.flickr.com/photos/danja/archives/date-posted/2004/11/28/>
    dc:date "November 28, 2004" .
 

If we want to start to do some SPARQL queries, we could use roqet.

If this query is in a file called sparql.rq (taken from http://librdf.org/query):

PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX xsd:   <http://www.w3.org/2001/XMLSchema#>
SELECT ?name ?symbol ?weight ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable>

WHERE {
 ?element table:group ?group .
 ?group table:name "Noble gas"^^xsd:string .
 ?element table:name ?name .
 ?element table:symbol ?symbol .
 ?element table:atomicWeight ?weight .
 ?element table:atomicNumber ?number
}

ORDER BY ASC(?name)

then:

roqet sparql.rq

would yield:

roqet: Querying from file example1.rq
roqet: Query has a variable bindings result
result: [name=string("argon"^^<http://www.w3.org/2001/XMLSchema#string>), 
   symbol=string("Ar"^^<http://www.w3.org/2001/XMLSchema#string>), 
   weight=string("39.948"^^<http://www.w3.org/2001/XMLSchema#float>), 
   number=string("18"^^<http://www.w3.org/2001/XMLSchema#integer>)]
result: [name=string("helium"^^<http://www.w3.org/2001/XMLSchema#string>), 
   symbol=string("He"^^<http://www.w3.org/2001/XMLSchema#string>), 
   weight=string("4.002602"^^<http://www.w3.org/2001/XMLSchema#float>), 
   number=string("2"^^<http://www.w3.org/2001/XMLSchema#integer>)]
result: [name=string("krypton"^^<http://www.w3.org/2001/XMLSchema#string>), 
   symbol=string("Kr"^^<http://www.w3.org/2001/XMLSchema#string>), 
   weight=string("83.798"^^<http://www.w3.org/2001/XMLSchema#float>), 
   number=string("36"^^<http://www.w3.org/2001/XMLSchema#integer>)]
result: [name=string("neon"^^<http://www.w3.org/2001/XMLSchema#string>), 
   symbol=string("Ne"^^<http://www.w3.org/2001/XMLSchema#string>), 
   weight=string("20.1797"^^<http://www.w3.org/2001/XMLSchema#float>), 
   number=string("10"^^<http://www.w3.org/2001/XMLSchema#integer>)]
result: [name=string("radon"^^<http://www.w3.org/2001/XMLSchema#string>), 
   symbol=string("Rn"^^<http://www.w3.org/2001/XMLSchema#string>), 
   weight=string("222"^^<http://www.w3.org/2001/XMLSchema#float>), 
   number=string("86"^^<http://www.w3.org/2001/XMLSchema#integer>)]
result: [name=string("xenon"^^<http://www.w3.org/2001/XMLSchema#string>), 
   symbol=string("Xe"^^<http://www.w3.org/2001/XMLSchema#string>), 
   weight=string("131.293"^^<http://www.w3.org/2001/XMLSchema#float>), 
   number=string("54"^^<http://www.w3.org/2001/XMLSchema#integer>)]
roqet: Query returned 6 results
If you’d like it back as JSON instead:
roqet -r json sparql.rq
or comma-separated variable:
roqet -r csv sparql.rq
or just the second column:
 
roqet -r csv sparql.rq | awk -F, ‘{print $2}’
roqet: Querying from file sparql.rq
name
"argon"^^uri(http://www.w3.org/2001/XMLSchema#string)
"helium"^^uri(http://www.w3.org/2001/XMLSchema#string)
"krypton"^^uri(http://www.w3.org/2001/XMLSchema#string)
"neon"^^uri(http://www.w3.org/2001/XMLSchema#string)
"radon"^^uri(http://www.w3.org/2001/XMLSchema#string)
"xenon"^^uri(http://www.w3.org/2001/XMLSchema#string)

Hopefully it is easy to imagine how the combination of (existing and new) command line tools to Web-oriented datasets described via Semantic Web technologies will be so compelling. There are rich, modern software environments such as NetKernel, Virtuoso and the Talis Platform that provide this vision in a scalable infrastructure, but it is also cool to have that power in your own environment.