Wikilinks Corpus: What Will You Do With 40 Million Disambiguated Entity Mentions Across 10 Million-Plus Web Pages?
Last Friday saw the release of the Wikilinks Corpus from Research at Google, 40 million entities in context strong.
As explained in a blog post by Dave Orr, Amar Subramanya, and Fernando Pereira at Google Research, the Big Data set “involves 40 million total disambiguated mentions within over 10 million web pages — over 100 times bigger than the next largest corpus.” The mentions, the post relates, are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If each page on Wikipedia is throught of as an entity, then the anchor text can be thought of as a mention of the corresponding entity, it says.


Valentin Spitkovsky and Peter Norvig of the Google Research Team have posted an article about their
Sebastian Thrun and Peter Norvig are teaching an online course entitled 
Eric Franzon
VP Community
Jennifer Zaino
Contributor
Angela Guess Contributor
semanticweb.com Twitter feed loading...