Common Crawl Founder Gil Elbaz Speaks About New Relationship With Amazon, Semantic Web Projects Using Its Corpus, And Why Open Web Crawls Matter To Developing Big Data Expertise
The Common Crawl Foundation’s repository of openly and freely accessible web crawl data is about to go live as a Public Data Set on Amazon Web Services. The non-profit Common Crawl is the vision of Gil Elbaz, who founded Applied Semantics and the AdSense technology for which Google acquired it , as well as the Factual open data aggregation platform, and it counts Nova Spivack — who’s been behind semantic services from Twine to Bottlenose – among its board of directors.
Elbaz’ goal in developing the repository: “You can’t access, let alone download, the Google or the Bing crawl data. So certainly we’re differentiated in being very open and transparent about what we’re crawling and actually making it available to developers,” he says.
“You might ask why is it going to be revolutionary to allow many more engineers and researchers and developers and students access to this data, whereas historically you have to work for one of the big search engines…. The question is, the world has the largest-ever corpus of knowledge out there on the web, and is there more that one can do with it than Google and Microsoft and a handful of other search engines are already doing? And the answer is unquestionably yes. ”

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. 
A new article reports
Online advertising that leverages semantic technology is expanding to the do-it-yourself model. 
What does it mean for a tablet to have a Semantic Web inspired user interface (UI)? Following the launch of the Grid 10 from
A recent article reports
Amazon is still searching for
Amazon is looking for
A recent article
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...