Blekko Data Donation Is A Big Benefit To Common Crawl

Common Crawl, the non-profit organization creating a repository of openly and freely accessible web crawl data, is getting a present from search engine provider blekko. It’s donating its metadata on search engine ranking for 140 million websites and 22 billion webpages to Common Crawl.
“The blekko data donation is a huge benefit to Common Crawl,” Common Crawl director Lisa Green told The Semantic Web Blog. “Knowing what the blekko team is crawling and how they rate those pages allows us to improve our crawler and enrich our corpus for high-value webpages.”



Common Crawl has announced the winners
Common Crawl is back in the news after
Common Crawl now is providing its 2012 corpus of web crawl data not just as .ARC files, but also is releasing the metadata files (JSON-based metadata with all the links from every page crawled, metatags, headers and so on) as well as text output.
The 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...