A new article out of WikiSeer reports that “the Santa Clara based start-up pioneering real-time semantic summarization, today announced that it has successfully tested, training and completed its 1.0 platform update using more than 3.5 million English-based articles available on the Wikipedia.org portal as well as from thousands of additional websites. In real-time WikiSeer captures the essence and core principles from any text document by extracting the five most instructive and informative sentences from a page, link or article. In the course of using Wikipedia there were thousands of articles (topics) whereby the platform would cull through tens of pages and paragraphs to arrive at the five most important sentences (user definable up to 10) with better than 85% accuracy based on user testing and feedback.” Read more
Posts Tagged ‘Wikipedia’
Tony Hirst has written up a demonstration that maps how programming languages influence each other according to Wikipedia. He explains, “By way of demonstrating how the recipe described in Visualising Related Entries in Wikipedia Using Gephi can easily be turned to other things, here’s a map of how different computer programming languages influence each other according to DBpedia/Wikipedia (above).” See the rest of his demonstration here.
In the comments, Hirst notes, “I think one of the major benefits to be had from these sorts of visualisation is in support of a visual analytical conversation between the analyst and the data. Read more
ScienceDaily recently covered an interesting new resource, WikiMaps. According to the article, “An international research team has developed a dynamic tool that allows you to see a map of what is ‘important’ on Wikipedia and the connections between different entries. The tool, which is currently in the ‘alpha’ phase of development, displays classic musicians, bands, people born in the 1980s, and selected celebrities, including Lady Gaga, Barack Obama, and Justin Bieber. A slider control, or play button, lets you move through time to see how a particular topic or group has evolved over the last 3 or 4 years. The desktop version allows you to select any article or topic.” Read more
Datasift recently announced a new feature, Wikistats, and added Wikipedia to the company’s list of data sources. The company reports, “Through Wikistats.co, DataSift provides a real-time insight into the trending articles on Wikipedia in the last 24 hours. Just as we identified the most popular stories on Twitter when we created Tweetmeme, Wikistats is another great showcase of what’s possible with DataSift’s Social-Data platform. By filtering and analyzing the activity stream of new articles and edits on Wikipedia, we’re able to surface an insight into the top articles and content being created. As well as providing a view into all articles on Wikipedia, we use our NLP (Natural Language Processing) service to categorize articles into popular categories including technology, banking, celebrities, politics, sports, and more.” Read more
Anthony Myers of CMS Wire reports, “Core ideas about a more structured Internet, vis a vis the semantic Web, are quickly approaching mainstream consciousness. When Wikipedia, the sixth most popular Web site in the world, debuts its Wikidata platform later this year, it will be a major step in this direction. ’Wikidata is going to blow everything else out of the water,’ Joe Devon of Startup Devs said during the closing panel of the 2012 Semantic Technology & Business Conference. Devon is also on the advisory board, but Dave McComb, who co-founded the SemTech Conference, also asserted his belief in how huge Wikidata is going to be.” Read more
The schema.org official blog has announced support for enumerated lists. Adding this support allows developers using schema.org to use selected externally maintained vocabularies in their schema.org markup. According to the W3C-hosted schema.org WebSchemas wiki, “This is in addition to the existing extension mechanisms we support, and the general ability to include whatever markup you like in your pages. The focus here is on external vocabularies which can be thought of as ‘supported’ (or anticipated) in some sense by schema.org.”
In other words, “Schema.org markup uses links into well-known authority lists to clarify which particular instance of a schema.org type (eg. Country) is being mentioned.”
Remember the days before Wikipedia had all the answers? We looked things up in libraries, referring to shelf-filling encyclopaedias. We bought CD-ROMs (remember them?) full of facts and pictures and video clips. We asked people. Sometimes, school home work actually required some work more strenuous than a cut and paste. We went about our business without remembering that New Coke briefly entered our lives on this day in 1985.
Wikipedia is far from perfect, and some of the concern around its role in a wider dumbing down of thought and argument may be justified. But, despite that, it’s a remarkable achievement and a wonderful resource. Those who argued that it would never work have clearly been proven wrong. Carefully maintained processes and the core principle of the neutral point of view mostly serve contributors well.
Mark Graham recently raised some concerns regarding the Wikidata project in The Atlantic. Graham writes, “Wikidata will create a collaborative database that is both machine readable and human editable and which will underpin a lot of knowledge that is presented in all 284 language versions of Wikipedia. In other words, the encyclopaedia plans to become part of the movement from a mostly human-readable Web to a Web in which computers and software can better make sense of information… The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.”
Graham Continues, “It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).”
Denny Vrandečić, project director of Wikidata, posted a thoughtful response to Graham’s article. I have re-posted Vrandečić’s response in its entirety:
Thank you for your well-thought criticism. When we were thinking first of adding structured data to Wikipedia, we were indeed thinking of giving every language edition its own data space. This way the Arab and the Hebrew Wikipedia community would not interfere with each other, nor would the Estonian and the Russian communities interfere with each other. Read more
Eileen Brown recently reported that SWiPE hopes to make querying search engines a less frustrating experience. Brown writes, “If you struggle with RDF triples (Resource Description Framework) and SPARQL (Query language and protocol for RDF) do not despair. SWiPE (Searching WIkiPedia by Example) allows semantic and well-structured knowledge bases to be easily queried from within the pages of Wikipedia. If you want to know which cities in Florida, founded in last century have more than 50 thousand people you will be able to enter the query conditions directly into the Infobox of a Wikipedia page. Swipe activates certain fields of Wikipedia that generate equivalent SPARQL queries executed on DBpedia.” Read more