[Editor’s Note: Since 2007, Sindice.com has served as a specialized search engine allowing Semantic Web practitioners and researchers to locate structured data on the Web. At the peak of its activity, Sindice.com had an index of over 700M pages and processed 20M pages per day. In a post last week, the founding team announced the end of support for Sindice.com to concentrate on delivering the technology developed for the engine to enterprise users. This week, SemanticWeb.com is proud to host a guest post by the founding team explaining the history, the challanges and the future of this technology.]
The word “Sindice” has been around for quite some time in research and practice on the “Semantic Web” or “lets see how we can turn the web into a database”.
Since 2007, Sindice.com has served as a specialized search engine that would do a crazy thing: throw away the text and just concentrate on the “markup” of the web pages. Sindice would provide an advanced API to query RDF, RDFa, Microformats and Microdata found on web sites, together with a number of other services. Sindice turned useful, we guess, as approximately 1100 scientific works in the last few years refer to it in a way or another.
Last week, we the founding team announced the end of our support of the original Sindice.com semantic search engine to concentrate on the technology that came from it.
With the launch in 2012 of Schema.org, Google and others have effectively embraced the vision of the “Semantic Web.” With the RDFa standard, and now even more with JSON-LD, richer markup is becoming more and more popular on websites. While there might not be public web data “search APIs”, large collections of crawled data (pages and RDF) exist today which are made available on cloud computing platforms for easy analysis with your favorite big data paradigm.
Even more interestingly, the technology of Sindice.com has been made available in several projects maintained either as open source (see below) or commercially supported by the Sindice.com team now transitioned in the Sindice LTD company, AKA SindiceTech.
It has been quite a journey for us, and given there is no single summary anywhere we thought we’d take this occasion to write and share it.
This is both for “historical” reasons and as a way to glimpse at future directions of this field and these technologies.