Wikilinks Corpus: What Will You Do With 40 Million Disambiguated Entity Mentions Across 10 Million-Plus Web Pages?
Last Friday saw the release of the Wikilinks Corpus from Research at Google, 40 million entities in context strong.
As explained in a blog post by Dave Orr, Amar Subramanya, and Fernando Pereira at Google Research, the Big Data set “involves 40 million total disambiguated mentions within over 10 million web pages — over 100 times bigger than the next largest corpus.” The mentions, the post relates, are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If each page on Wikipedia is throught of as an entity, then the anchor text can be thought of as a mention of the corresponding entity, it says.


In December the Semantic Web Blog spoke with Wikidata project director Denny Vrandecic about progress on Phase 1 of the work to create a free 
A new article out of WikiSeer reports
Tony Hirst has written up a demonstration
ScienceDaily recently covered
Datasift recently announced a new feature, Wikistats
Anthony Myers of CMS Wire reports


Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...