I sat down with Dr. Alexandre Passant, of DERI NUI Galway, to discuss his recent research projects entitled “sparqlPuSh” and “SMOB.” SparqlPuSh combines Google’s PubSubHubbub, SPARQL, and SPARQL updates for proactive notifications in RDF stores. The interface was designed to be plugged on top of an RDF store. SMOB is a distributed microblogging system that reuses similar principles to enable privacy in microblogging, and won a Google research award. The full specification of sparqlPuSHcan be found on the Google code website.

Passant began by discussing the real-time web and applications that utilize citizen sensing. Practical applications are being developed that combine sensors and social data to solve real-world problems. Passant said “The web is now being looked at like a large information stream.We can use this stream to build real-time semantic applications.” He quoted a recent paper from WWW2010 entitled “Earthquake Shakes Twitter Users:Real-time Event Detection by Social Sensors.”  In this paper the authors discuss how they semantically analyze a tweet and how each user is regarded as a “sensor” in the application.

Passant described two approaches for getting updates from websites. He described a “pull approach” and a “push approach.” In a pull approach subscribers can go to a website and request  an update, say every minute, to see what is new on that site. The push approach allows the website to let subscribers know when there are new updates coming from the site.  Passant described the push model as the “Wait.Receive. Consume” model.

To create a push approach Google has a protocol called PubSubHubbub. PubSubHubbub creates a method for users to subscribe to a website, that has an ATOM/RSS feed, and subsequently get all future broadcasted updates from the website. A diagram of how this works is shown in the Fig. 1.1 below. The model for this protocol is simple. Publishers push out updates to a single hub and the hub broadcasts to all subscribers when there is an update. Rather than handling millions of requests from other subscribers the publisher can push to PubSubHubbub and allow the HUB to broadcast the updates. On the client side the push approach allows subscribers to wait for updates from the Hub rather than constantly polling for data.

Figure 1.1 – Example of Subscriber getting an update from PubSubHubbub

Passant described a new architecture based on the push approach that is intended to work for semantic web applications. Passant demonstrated an architecture where triggers can be semantically defined using SPARQL. Passant’s team developed an interface that can be plugged on top of any RDF store called “sparqlPuSH.” It allows you to register your queries on top of a SPARQL end point. As soon as there is an update that matches your query you get an update via an update via an Atom or RSS feed. The code for this project is registered at Google code.

Figure 1.2 visually demonstrates how sparqlPuSH interfaces with Google’s PubSubHubbub. First, the client describes the query it is interested in monitoring to the RDF store (step one in Figure 1.2). This creates an RSS feed back to the client (step 4. In Figure 1.2). The PuSh Hub, in Fig. 1.2, is Google’s PubSubHubbub.

 

Figure 1.2 – How Google’s PubSubHubbub interfaces with the sparqlPuSH Interface

There is also a user interface, shown in Figure 1.3, where users can register queries. The user interface is called  “sparqlPuSH UI.” This allows you to register new queries in additions to viewingqueries that other users have created.

Figure 1.3 – Example of sparqlPUSH User Interface

 

After describing the user interface Passant discussed how to receive notifications once you register your query. Figure 1.4,below, illustrates the architecture for SPARQL query updates through the interface. The system automatically runs the queries through the interface to the triple store and then sends out an update to the PuSH HUB to update all subscribers.

 

Fig. 1.4 – Updating clients using the sparqlPUSH User Interface.

 

Passant’s presentation was a great example of how to build real-time semantic applications using the push concept and Google’s PubSubHubbub. The full source code runs under the BSD software license.

 

About the Author:
Sean Golliher is founder and publisher of the peer-reviewed Search Marketing Research Journal SEMJ.org. Sean holds four engineering patents, has a B.S. in physics from the University of Washington in Seattle, and a master’s in electrical engineering from Washington State University. He is also president and director of search marketing at Future Farm, Inc., Bozeman MT, where he focuses on search marketing, internet research,  and consults for large companies. He has appeared and been interviewed on well-known blogs and radio stations such as Clickz.com, Webmasterradio.com, and SEM Synergy. He was featured in a radio interview on SEM synergy with representatives from eBay discussing the future of affiliate marketing. To maintain a competitive edge he reads search patents, papers, and attends search marketing conferences on a regular basis.