Big Graph Data Panel at ISWC 2012

Big Graph Data Panelists (L to R): Mike Stonebraker, John Giannandrea, Bryan Thompson, Tim Berners- Lee, Frank van Harmelen

Last week, the 11th International Semantic Web Conference (ISWC 2012) took place in Boston. It was an exciting week to learn about the advances of the Semantic Web and current applications.

The first two days, Sunday November 11 and Monday November 12, consisted of 18 workshops and 8 tutorials. The following three days (Tuesday November 13 – Thursday November 15) consisted of keynotes, presentation of academic and in-use papers, the Big Graph Data Panel and industry presentations. It is basically impossible to attend all the interesting presentations. Therefore, I am going to try my best to summarize and offer links to everything that I can.

Keynotes

There were three keynotes by Thomas Malone from MIT’s Sloan School of Management, Jeanne Holm, the evangelist at Data.Gov and Mark Musen from Stanford’s Center for Biomedical Informatics. Malone’s keynote was on the semantic web and collective intelligence. The main idea is to represent not only data but also processes on the Semantic Web. This would require ontologies to have verbs, not only nouns. Holm’s keynote presented the current status of Data.Gov including semantic.data.gov and healthdata.gov, and the journey of creating a data ecosystem. Her slides can be found here. Musen’s keynote presented lessons learned from the “AI winter” and what Semantic Web researchers should take into account today. A key take-away was “approximate semantics goes a long way.”

Big Graph Data Panel

This panel was a highlight of the conference. The panelists were Tim Berners-Lee (W3C), John Giannandrea (Google), Mike Stonebraker (MIT), Bryan Thompson (Systap), and Frank van Harmelen (VU) as the moderator. Stonebreaker stated that the Semantic Web is another application of a graph database. Berners-Lee responded that he enjoyed staying at the application layer. The issue of “what is big data?” arose and Stonebraker defined it as either Big Volume or Big Velocity or Big Variety. Additionally, he stated that big data is only a problem if your data need grows faster than memory gets cheaper. Giannandrea stated that they control dataset size by reconciling. However, the hard problem is deciding when two things are the same. Berners-Lee commented that you should reconcile the vocabularies you are planning to share. Stonebraker added that the biggest problem is trying to put stuff together after the fact that was not designed to be put together. Additionally, Stonebraker stated that relational technology is old, obsolete technology and has been beaten in every vertical by custom solutions. New database architectures could be 100 times faster. However, benchmarks are lacking. Parallelization was another topic. Stonebraker commented that everything that is done at scale must be parallelized, otherwise it would run forever.

Throughout the conference

There were several talks on SPARQL and benchmarking: “Using SPARQL to Query BioPortal” which summarized the challenges of running a public SPARQL endpoint, “Efficient Execution of top-k SPARQL queries“, “Benchmark Federating SPARQL Query Engines: Are Existing Testbeds Enough“, “SRBench: A Streaming RDF/SPARQL Benchmark” and “SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data“.

Several industry talks also took place. During the Consuming Linked Data Workshop, Bart van Leeuwen, firefighter from the city of Amsterdam gave a keynote, “Emergency Response: Using Real-time Semantic Web Technology” and Evan Sandhaus from New York Times gave a keynote on “Linked Data at the New York Times: The First 161 Years”. Kavitha Srinivas from IBM Research gave the keynote at the Scalable and High Performance Systems Workshop where she presented the internals of DB2′s RDF Database. The industry track had presentations by many companies including Oracle, BestBuy, FluidOps, Antidot, Open University and YarcData. Jay Myers stated that Semantic Web is still hard to sell and more use cases are needed. Myers asked people to submit their own Semantic Web elevator pitches at http://www.semanticwebelevatorpitch.com/.

Finally, the best student research paper award went to “Ontology-Based Access to Probabilistic Data with OWL-QL” and the best research paper award went to “Discovering Concept Coverings in Ontologies of Linked Data Sources“.

Overall, this was a fantastic conference. A great mix of academia and industry. I was surprised to see several industry applications, and industry folks attending the conference. Can’t wait for next year, which will be in Sydney.

Photo: Twitter / @phaase

About the Author

Photo of Juan SequedaJuan Sequeda is a Ph.D student at the University of Texas at Austin and a NSF Graduate Research Fellow. His research is in the intersection of Semantic Web and Relational Databases. He co-created the Consuming Linked Data Workshop series and regularly gives talks at academic and industry semantic web conferences. Juan is an Invited Expert on the W3C RDB2RDF Working Group and an editor of the “Direct Mapping of Relational Data to RDF” specification. Juan is also the founder of a new startup, Capsenta, which is a spin-off from his research.