Datasift recently announced a new feature, Wikistats, and added Wikipedia to the company’s list of data sources. The company reports, “Through Wikistats.co, DataSift provides a real-time insight into the trending articles on Wikipedia in the last 24 hours. Just as we identified the most popular stories on Twitter when we created Tweetmeme, Wikistats is another great showcase of what’s possible with DataSift’s Social-Data platform. By filtering and analyzing the activity stream of new articles and edits on Wikipedia, we’re able to surface an insight into the top articles and content being created. As well as providing a view into all articles on Wikipedia, we use our NLP (Natural Language Processing) service to categorize articles into popular categories including technology, banking, celebrities, politics, sports, and more.” Read more
Posts Tagged ‘Wikipedia’
Anthony Myers of CMS Wire reports, “Core ideas about a more structured Internet, vis a vis the semantic Web, are quickly approaching mainstream consciousness. When Wikipedia, the sixth most popular Web site in the world, debuts its Wikidata platform later this year, it will be a major step in this direction. ’Wikidata is going to blow everything else out of the water,’ Joe Devon of Startup Devs said during the closing panel of the 2012 Semantic Technology & Business Conference. Devon is also on the advisory board, but Dave McComb, who co-founded the SemTech Conference, also asserted his belief in how huge Wikidata is going to be.” Read more
The schema.org official blog has announced support for enumerated lists. Adding this support allows developers using schema.org to use selected externally maintained vocabularies in their schema.org markup. According to the W3C-hosted schema.org WebSchemas wiki, “This is in addition to the existing extension mechanisms we support, and the general ability to include whatever markup you like in your pages. The focus here is on external vocabularies which can be thought of as ‘supported’ (or anticipated) in some sense by schema.org.”
In other words, “Schema.org markup uses links into well-known authority lists to clarify which particular instance of a schema.org type (eg. Country) is being mentioned.”
Remember the days before Wikipedia had all the answers? We looked things up in libraries, referring to shelf-filling encyclopaedias. We bought CD-ROMs (remember them?) full of facts and pictures and video clips. We asked people. Sometimes, school home work actually required some work more strenuous than a cut and paste. We went about our business without remembering that New Coke briefly entered our lives on this day in 1985.
Wikipedia is far from perfect, and some of the concern around its role in a wider dumbing down of thought and argument may be justified. But, despite that, it’s a remarkable achievement and a wonderful resource. Those who argued that it would never work have clearly been proven wrong. Carefully maintained processes and the core principle of the neutral point of view mostly serve contributors well.
Mark Graham recently raised some concerns regarding the Wikidata project in The Atlantic. Graham writes, “Wikidata will create a collaborative database that is both machine readable and human editable and which will underpin a lot of knowledge that is presented in all 284 language versions of Wikipedia. In other words, the encyclopaedia plans to become part of the movement from a mostly human-readable Web to a Web in which computers and software can better make sense of information… The reason that Wikidata marks such a significant moment in Wikipedia’s history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups.”
Graham Continues, “It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?).”
Denny Vrandečić, project director of Wikidata, posted a thoughtful response to Graham’s article. I have re-posted Vrandečić’s response in its entirety:
Thank you for your well-thought criticism. When we were thinking first of adding structured data to Wikipedia, we were indeed thinking of giving every language edition its own data space. This way the Arab and the Hebrew Wikipedia community would not interfere with each other, nor would the Estonian and the Russian communities interfere with each other. Read more
Eileen Brown recently reported that SWiPE hopes to make querying search engines a less frustrating experience. Brown writes, “If you struggle with RDF triples (Resource Description Framework) and SPARQL (Query language and protocol for RDF) do not despair. SWiPE (Searching WIkiPedia by Example) allows semantic and well-structured knowledge bases to be easily queried from within the pages of Wikipedia. If you want to know which cities in Florida, founded in last century have more than 50 thousand people you will be able to enter the query conditions directly into the Infobox of a Wikipedia page. Swipe activates certain fields of Wikipedia that generate equivalent SPARQL queries executed on DBpedia.” Read more
A paper entitled “Recovering Semantics of Tables on the Web” was presented at the 37th Conference on Very Large Databases in Seattle, WA . The paper’s authors included 6 Google engineers along with professor Petros Venetis of Stanford University and Gengxin Miao of UC Santa Barbara. The paper summarizes an approach for recovering the semantics of tables with additional annotations other than what the author of a table has provided. The paper is of interest to developers working on the semantic web because it gives insight into how programmers can use semantic data (database of triples) and Open Information Extraction (OIE) to enhance unstructured data on the web. In addition they compare how a “maximum-likelihood” model, used to assign class labels to tables, compares to a “database of triples” approach. The authors show that their method for labeling tables is capable of labeling “an order of magnitude more tables on the web than is possible using Wikipedia/YAGO and many more than freebase.”
In 2005, I started learning about the so-called Semantic Web. It wasn’t till 2008, the same year I started my PhD, that I finally understood what the Semantic Web was really about. At the time, I made a $1000 bet with 3 college buddies that the Semantic Web would be mainstream by the time I finished my PhD. I know I’m going to win! In this post, I will argue why.