Posts Tagged ‘Wikipedia’

Wikilinks Corpus: What Will You Do With 40 Million Disambiguated Entity Mentions Across 10 Million-Plus Web Pages?

Last Friday saw the release of the Wikilinks Corpus from Research at Google, 40 million entities in context strong.

As explained in a blog post by Dave Orr, Amar Subramanya, and Fernando Pereira at Google Research, the Big Data set “involves 40 million total disambiguated mentions within over 10 million web pages — over 100 times bigger than the next largest corpus.” The mentions, the post relates, are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If each page on Wikipedia is throught of as an entity, then the anchor text can be thought of as a mention of the corresponding entity, it says.

Read more

Wikidata Phase 2 In Full Swing

In December the Semantic Web Blog spoke with Wikidata project director Denny Vrandecic about progress on Phase 1 of the work to create a free knowledge base about the world that can be read and edited by humans and machines (see story here). At the time, Vrandecic explained that January would begin the roll-out of language-by-language editions – first up were Hungarian, Hebrew and Italian – on the Wikipedias.

Last week brought another language on board, as Wikidata Phase 1 went live on English Wikipedia, with Wikidata language links supplementing locally-hosted ones there too.  March 6 should see deployment to the Wikipedias that do not have language links.

In an important update, Phase 2 of the overall effort to centralize access to and management of structured data – which was in development as Phase 1 progressed – saw its first fruits for use on Wikidata.org (not yet on Wikipedia) earlier this month: Infoboxes.

Read more

Wikidata: People And Bots Busy Filling The System In Phase One

Ever heard of the Finnish television series Matkaoppaat? It’s a program about tour guides abroad – something of a reality show that looks like it has already spawned copycat programs with more on the way in other countries.

But of more interest to readers of The Semantic Web Blog is that just a couple of days ago, the series was added as item Q1000000 to Wikidata, on the heels of other recent entries like the English town Newton-le-Willows (item ID Q750000) and American alpine skier Tim Jitloff (ID Q500000). They’re following in the footsteps of earlier items like Dutch Wikipedia (ID Q10000), which was added just four days after Wikidata was launched on Oct. 30.

“Right now the system is launched (since end of October) and people and bots are filling it,” says Wikidata project director Denny Vrandecic, of the Wikimedia Foundation’s effort to create a free knowledge base about the world that can be read and edited by humans and machines alike.

Read more

WikiSeer Tackles Semantic Summaries

A new article out of WikiSeer reports that “the Santa Clara based start-up pioneering real-time semantic summarization, today announced that it has successfully tested, training and completed its 1.0 platform update using more than 3.5 million English-based articles available on the Wikipedia.org portal as well as from thousands of additional websites. In real-time WikiSeer captures the essence and core principles from any text document by extracting the five most instructive and informative sentences from a page, link or article. In the course of using Wikipedia there were thousands of articles (topics) whereby the platform would cull through tens of pages and paragraphs to arrive at the five most important sentences (user definable up to 10) with better than 85% accuracy based on user testing and feedback.” Read more

A Look at How Programming Languages Influence Each Other

Tony Hirst has written up a demonstration that maps how programming languages influence each other according to Wikipedia. He explains, “By way of demonstrating how the recipe described in Visualising Related Entries in Wikipedia Using Gephi can easily be turned to other things, here’s a map of how different computer programming languages influence each other according to DBpedia/Wikipedia (above).” See the rest of his demonstration here.

In the comments, Hirst notes, “I think one of the major benefits to be had from these sorts of visualisation is in support of a visual analytical conversation between the analyst and the data. Read more

Finding What’s ‘Important’ on Wikipedia with WikiMaps

ScienceDaily recently covered an interesting new resource, WikiMaps. According to the article, “An international research team has developed a dynamic tool that allows you to see a map of what is ‘important’ on Wikipedia and the connections between different entries. The tool, which is currently in the ‘alpha’ phase of development, displays classic musicians, bands, people born in the 1980s, and selected celebrities, including Lady Gaga, Barack Obama, and Justin Bieber. A slider control, or play button, lets you move through time to see how a particular topic or group has evolved over the last 3 or 4 years. The desktop version allows you to select any article or topic.” Read more

Datasift Announces Wikistats

Datasift recently announced a new feature, Wikistats, and added Wikipedia to the company’s list of data sources. The company reports, “Through Wikistats.co, DataSift provides a real-time insight into the trending articles on Wikipedia in the last 24 hours. Just as we identified the most popular stories on Twitter when we created Tweetmeme, Wikistats is another great showcase of what’s possible with DataSift’s Social-Data platform. By filtering and analyzing the activity stream of new articles and edits on Wikipedia, we’re able to surface an insight into the top articles and content being created. As well as providing a view into all articles on Wikipedia, we use our NLP (Natural Language Processing) service to categorize articles into popular categories including technology, banking, celebrities, politics, sports, and more.” Read more

Wikidata Closes SemTechBiz SF with a Bang

Anthony Myers of CMS Wire reports, “Core ideas about a more structured Internet, vis a vis the semantic Web, are quickly approaching mainstream consciousness. When Wikipedia, the sixth most popular Web site in the world, debuts its Wikidata platform later this year, it will be a major step in this direction. ‘Wikidata is going to blow everything else out of the water,’ Joe Devon of Startup Devs said during the closing panel of the 2012 Semantic Technology & Business Conference. Devon is also on the advisory board, but Dave McComb, who co-founded the SemTech Conference, also asserted his belief in how huge Wikidata is going to be.” Read more

Schema.org Now Supports External Lists

The schema.org official blog has announced support for enumerated lists. Adding this support allows developers using schema.org to use selected externally maintained vocabularies in their schema.org markup. According to the W3C-hosted schema.org WebSchemas wiki, “This is in addition to the existing extension mechanisms we support, and the general ability to include whatever markup you like in your pages. The focus here is on external vocabularies which can be thought of as ‘supported’ (or anticipated) in some sense by schema.org.”

In other words, “Schema.org markup uses links into well-known authority lists to clarify which particular instance of a schema.org type (eg. Country) is being mentioned.”

For example, consider a list of countries of the world. A developer could use this URI from Wikipedia to reference the USA or this one from the UN FAO, or this one from GeoNames.

Read more

Wikidata, and a clash of world views

Remember the days before Wikipedia had all the answers? We looked things up in libraries, referring to shelf-filling encyclopaedias. We bought CD-ROMs (remember them?) full of facts and pictures and video clips. We asked people. Sometimes, school home work actually required some work more strenuous than a cut and paste. We went about our business without remembering that New Coke briefly entered our lives on this day in 1985.

Wikipedia is far from perfect, and some of the concern around its role in a wider dumbing down of thought and argument may be justified. But, despite that, it’s a remarkable achievement and a wonderful resource. Those who argued that it would never work have clearly been proven wrong. Carefully maintained processes and the core principle of the neutral point of view mostly serve contributors well.

With Wikimedia Deutschland‘s recent announcement of Wikidata, many of the early concerns about Wikipedia itself have resurfaced once again. Read more

<< PREVIOUS PAGENEXT PAGE >>