Posts Tagged ‘Wikipedia’

Wikidata Phase 2 In Full Swing

In December the Semantic Web Blog spoke with Wikidata project director Denny Vrandecic about progress on Phase 1 of the work to create a free knowledge base about the world that can be read and edited by humans and machines (see story here). At the time, Vrandecic explained that January would begin the roll-out of language-by-language editions – first up were Hungarian, Hebrew and Italian – on the Wikipedias.

Last week brought another language on board, as Wikidata Phase 1 went live on English Wikipedia, with Wikidata language links supplementing locally-hosted ones there too.  March 6 should see deployment to the Wikipedias that do not have language links.

In an important update, Phase 2 of the overall effort to centralize access to and management of structured data – which was in development as Phase 1 progressed – saw its first fruits for use on Wikidata.org (not yet on Wikipedia) earlier this month: Infoboxes.

Read more

Wikidata: People And Bots Busy Filling The System In Phase One

Ever heard of the Finnish television series Matkaoppaat? It’s a program about tour guides abroad – something of a reality show that looks like it has already spawned copycat programs with more on the way in other countries.

But of more interest to readers of The Semantic Web Blog is that just a couple of days ago, the series was added as item Q1000000 to Wikidata, on the heels of other recent entries like the English town Newton-le-Willows (item ID Q750000) and American alpine skier Tim Jitloff (ID Q500000). They’re following in the footsteps of earlier items like Dutch Wikipedia (ID Q10000), which was added just four days after Wikidata was launched on Oct. 30.

“Right now the system is launched (since end of October) and people and bots are filling it,” says Wikidata project director Denny Vrandecic, of the Wikimedia Foundation’s effort to create a free knowledge base about the world that can be read and edited by humans and machines alike.

Read more

WikiSeer Tackles Semantic Summaries

A new article out of WikiSeer reports that “the Santa Clara based start-up pioneering real-time semantic summarization, today announced that it has successfully tested, training and completed its 1.0 platform update using more than 3.5 million English-based articles available on the Wikipedia.org portal as well as from thousands of additional websites. In real-time WikiSeer captures the essence and core principles from any text document by extracting the five most instructive and informative sentences from a page, link or article. In the course of using Wikipedia there were thousands of articles (topics) whereby the platform would cull through tens of pages and paragraphs to arrive at the five most important sentences (user definable up to 10) with better than 85% accuracy based on user testing and feedback.” Read more

A Look at How Programming Languages Influence Each Other

Tony Hirst has written up a demonstration that maps how programming languages influence each other according to Wikipedia. He explains, “By way of demonstrating how the recipe described in Visualising Related Entries in Wikipedia Using Gephi can easily be turned to other things, here’s a map of how different computer programming languages influence each other according to DBpedia/Wikipedia (above).” See the rest of his demonstration here.

In the comments, Hirst notes, “I think one of the major benefits to be had from these sorts of visualisation is in support of a visual analytical conversation between the analyst and the data. Read more

Finding What’s ‘Important’ on Wikipedia with WikiMaps

ScienceDaily recently covered an interesting new resource, WikiMaps. According to the article, “An international research team has developed a dynamic tool that allows you to see a map of what is ‘important’ on Wikipedia and the connections between different entries. The tool, which is currently in the ‘alpha’ phase of development, displays classic musicians, bands, people born in the 1980s, and selected celebrities, including Lady Gaga, Barack Obama, and Justin Bieber. A slider control, or play button, lets you move through time to see how a particular topic or group has evolved over the last 3 or 4 years. The desktop version allows you to select any article or topic.” Read more

Datasift Announces Wikistats

Datasift recently announced a new feature, Wikistats, and added Wikipedia to the company’s list of data sources. The company reports, “Through Wikistats.co, DataSift provides a real-time insight into the trending articles on Wikipedia in the last 24 hours. Just as we identified the most popular stories on Twitter when we created Tweetmeme, Wikistats is another great showcase of what’s possible with DataSift’s Social-Data platform. By filtering and analyzing the activity stream of new articles and edits on Wikipedia, we’re able to surface an insight into the top articles and content being created. As well as providing a view into all articles on Wikipedia, we use our NLP (Natural Language Processing) service to categorize articles into popular categories including technology, banking, celebrities, politics, sports, and more.” Read more

Wikidata Closes SemTechBiz SF with a Bang

Anthony Myers of CMS Wire reports, “Core ideas about a more structured Internet, vis a vis the semantic Web, are quickly approaching mainstream consciousness. When Wikipedia, the sixth most popular Web site in the world, debuts its Wikidata platform later this year, it will be a major step in this direction. ‘Wikidata is going to blow everything else out of the water,’ Joe Devon of Startup Devs said during the closing panel of the 2012 Semantic Technology & Business Conference. Devon is also on the advisory board, but Dave McComb, who co-founded the SemTech Conference, also asserted his belief in how huge Wikidata is going to be.” Read more

Schema.org Now Supports External Lists

The schema.org official blog has announced support for enumerated lists. Adding this support allows developers using schema.org to use selected externally maintained vocabularies in their schema.org markup. According to the W3C-hosted schema.org WebSchemas wiki, “This is in addition to the existing extension mechanisms we support, and the general ability to include whatever markup you like in your pages. The focus here is on external vocabularies which can be thought of as ‘supported’ (or anticipated) in some sense by schema.org.”

In other words, “Schema.org markup uses links into well-known authority lists to clarify which particular instance of a schema.org type (eg. Country) is being mentioned.”

For example, consider a list of countries of the world. A developer could use this URI from Wikipedia to reference the USA or this one from the UN FAO, or this one from GeoNames.

Read more

Wikidata, and a clash of world views

Remember the days before Wikipedia had all the answers? We looked things up in libraries, referring to shelf-filling encyclopaedias. We bought CD-ROMs (remember them?) full of facts and pictures and video clips. We asked people. Sometimes, school home work actually required some work more strenuous than a cut and paste. We went about our business without remembering that New Coke briefly entered our lives on this day in 1985.

Wikipedia is far from perfect, and some of the concern around its role in a wider dumbing down of thought and argument may be justified. But, despite that, it’s a remarkable achievement and a wonderful resource. Those who argued that it would never work have clearly been proven wrong. Carefully maintained processes and the core principle of the neutral point of view mostly serve contributors well.

With Wikimedia Deutschland‘s recent announcement of Wikidata, many of the early concerns about Wikipedia itself have resurfaced once again. Read more

Two Perspectives on Wikidata

Mark Graham recently raised some concerns regarding the Wikidata project in The Atlantic. Graham writes, “Wikidata will create a collaborative database that is both machine readable and human editable and which will underpin a lot of knowledge that is presented in all 284 language versions of Wikipedia. In other words, the encyclopaedia plans to become part of the movement from a mostly human-readable Web to a Web in which computers and software can better make sense of information… The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.”

Graham Continues, “It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).”

Read the full article here.

Denny Vrandečić, project director of Wikidata, posted a thoughtful response to Graham’s article. I have re-posted Vrandečić’s response in its entirety:

Mark,

Thank you for your well-thought criticism. When we were thinking first of adding structured data to Wikipedia, we were indeed thinking of giving every language edition its own data space. This way the Arab and the Hebrew Wikipedia community would not interfere with each other, nor would the Estonian and the Russian communities interfere with each other. Read more

<< PREVIOUS PAGENEXT PAGE >>