Posts Tagged ‘Sig.ma’

End of Support for the Sindice.com search engine: history, lessons learned, and legacy (Guest Post)

[Editor's Note: Since 2007, Sindice.com has served as a specialized search engine allowing Semantic Web practitioners and researchers to locate structured data on the Web. At the peak of its activity, Sindice.com had an index of over 700M pages and processed 20M pages per day. In a post last week, the founding team announced the end of support for Sindice.com to concentrate on delivering the technology developed for the engine to enterprise users. This week, SemanticWeb.com is proud to host a guest post by the founding team explaining the history, the challanges and the future of this technology.]

Photo of the Sindice Team, 2012

Photo of the Sindice Team, 2012

The word “Sindice” has been around for quite some time in research and practice on the “Semantic Web” or “lets see how we can turn the web into a database”.

Since 2007, Sindice.com has served as a specialized search engine that would do a crazy thing: throw away the text and just concentrate on the “markup” of the web pages. Sindice would provide an advanced API to query RDF, RDFa, Microformats and Microdata found on web sites, together with a number of other services. Sindice turned useful, we guess, as approximately 1100 scientific works in the last few years refer to it in a way or another.

Last week, we the founding team announced the end of our support of the original Sindice.com semantic search engine to concentrate on the technology that came from it.

With the launch in 2012 of Schema.org, Google and others have effectively embraced the vision of the “Semantic Web.” With the RDFa standard, and now even more with JSON-LD, richer markup is becoming more and more popular on websites. While there might not be public web data “search APIs”, large collections of crawled data (pages and RDF) exist today which are made available on cloud computing platforms for easy analysis with your favorite big data paradigm.

Even more interestingly, the technology of Sindice.com has been made available in several projects maintained either as open source (see below) or commercially supported by the Sindice.com team now transitioned in the Sindice LTD company, AKA SindiceTech.

It has been quite a journey for us, and given there is no single summary anywhere we thought we’d take this occasion to write and share it.

This is both for “historical” reasons and as a way to glimpse at future directions of this field and these technologies.

Read more

Catch Up With Past Semantic Web Challenge Winners

 A couple of weeks ago we reported on the results of the latest Elsevier Semantic Web Challenge (see story here). That led us to wonder: What’s been going on with some of last year’s Open Track winners (covered in this story here)?

We’ll start with TrialX, which is turning semantic technology and social media into an online clinical trial patient recruitment business. Last year we were impressed by the service’s grasp of how important it is to present a user-friendly front to consumers, who can visit the site and enter information about their health condition and then use its decision engine to map up to clinical trials that may be appropriate for them to participate in, or even use their personal electronic health records to find out about trials.

Read more