DBpedia 3.9 is up and going. Word came today from Christian Bizer and Christopher Sahnwaldt that the new release boasts an overall increase in the number of concepts in the English edition from 3.7 to 4 million things, thanks to being based on updated Wikipedia dumps from the spring of 2013.
Other numbers to impress:
Posts Tagged ‘Christian Bizer’
Here are some final thoughts from our panel of semantic web experts on what to expect to see as the New Year rings in:
Broader deployment of the schema.org terms is likely. In the study by Muehlisen and Bizer in July this year, we saw Open Graph Protocol, DC, FOAF, RSS, SIOC and Creative Commons still topping the ranks of top semantic vocabularies being used. In 2013 and beyond, I expect to see schema.org jump to the top of that list.
Christine Connors, Chief Ontologist, Knowledgent:
I think we will see an uptick in the job market for semantic technologists in the enterprise; primarily in the Fortune 2000. I expect to see some M&A activity as well from systems providers and integrators who recognize the desire to have a semantic component in their product suite. (No, I have no direct knowledge; it is my hunch!)
We will see increased competition from data analytics vendors who try to add RDF, OWL or graphstores to their existing platforms. I anticipate saying, at the end of 2013, that many of these immature deployments will leave some project teams disappointed. The mature vendors will need to put resources into sales and business development, with the right partners for consulting and systems integration, to be ready to respond to calls for proposals and assistance.
Christian Bizer and Robert Meusel of the Web Data Commons project today announced the release of a new WebDataCommons dataset: “The dataset has been extracted from the latest version of the Common Crawl. This August 2012 version of the Common Crawl contains over 3 billion HTML pages which originate from over 40 million websites (pay-level-domains). Altogether we discovered structured data within 369 million HTML pages contained in the Common Crawl corpus (12.3%). The pages containing structured data originate from 2.29 million websites (5.65%). Approximately 519 thousand of these websites use RDFa, while 140 thousand websites use Microdata. Microformats are used on 1.7 million websites.” Read more