Posts Tagged ‘linked data’
- In a sample of over 12 billion web pages, 21 percent, or 2.5 billion pages, use it to mark up HTML pages, to the tune of more than 15 billion entities and more than 65 billion triples;
- In that same sample, this works out to six entities and 26 facts per page with schema.org;
- Just about every major site in every major category, from news to e-commerce (with the exception of Amazon.com), uses it;
- Its ontology counts some 800 properties and 600 classes.
A lot of it has to do with the focus its proponents have had since the beginning on making it very easy for webmasters and developers to adopt and leverage the collection of shared vocabularies for page markup. At this August’s 10th annual Semantic Technology & Business conference in San Jose, Google Fellow Ramanathan V. Guha, one of the founders of schema.org, shared the progress of the initiative to develop one vocabulary that would be understood by all search engines and how it got to where it is today.
Professor Dr. Christian Bizer of the University of Mannheim, Germany, has announced the release of DBpedia 2014. DBpedia is described at dbpedia.org as “… a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.”
The full announcement on the new release is reprinted below with Bizer’s permission.
DBpedia Version 2014 released
1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.
The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases. Read more
A Drupal ++ platform for semantic web biomedical data – that’s how Sudeshna Das describes eXframe, a reusable framework for creating online repositories of genomics experiments. Das – who among other titles is affiliate faculty of the Harvard Stem Cell Institute – is one of the developers of eXframe, which leverages Stéphane Corlosquet’s RDF module for Drupal to produce, index (into an RDF store powered by the ARC2 PHP library) and publish semantic web data in the second generation version of the platform.
“We used the RDF modules to turn eXframe into a semantic web platform,” says Das. “That was key for us because it hid all the complexities of semantic technology.”
One instance of the platform today can be found in the repository for stem cell data as part of the Stem Cell Commons, the Harvard Stem Cell Institute’s community for stem cell bioinformatics. But Das notes the importance of the reusability aspect of the software platform to build genomics repositories that automatically produce Linked Data as well as a SPARQL endpoint, is that it becomes easy to build new repository instances with much less effort. Working off Drupal as its base, eXframe has been customized to support biomedical data and to integrate biomedical ontologies and knowledge bases.
XSB and SemanticWeb.Com Partner In App Developer Challenge To Help Build The Industrial Semantic Web
An invitation was issued to developers at last week’s Semantic Technology and Business Conference: XSB and SemanticWeb.com have joined to sponsor the Semantic Web Developer Challenge, which asks participants to build sourcing and product life cycle management applications leveraging XSB’s PartLink Data Model.
XSB is developing PartLink as a project for the Department of Defense Rapid Innovation Fund. It uses semantic web technology to create a coherent Linked Data model for all part information in the Department of Defense’s supply chain – some 40 million parts strong.
“XSB recognized the opportunity to standardize and link together information about the parts, manufacturers, suppliers, materials, [and] technical characteristics using semantic technologies. The parts ontology is deep and detailed with 10,000 parts categories and 1,000 standard attributes defined,” says Alberto Cassola, vp sales and marketing at XSB, a leading provider of master data management solutions to large commercial and government entities. PartLink’s Linked Data model, he says, “will serve as the foundation for building the industrial semantic web.”
Dan Gillick and Dave Orr recently wrote, “Language understanding systems are largely trained on freely available data, such as the Penn Treebank, perhaps the most widely used linguistic resource ever created. We have previously released lots of linguistic data ourselves, to contribute to the language understanding community as well as encourage further research into these areas. Now, we’re releasing a new dataset, based on another great resource: the New York Times Annotated Corpus, a set of 1.8 million articles spanning 20 years. 600,000 articles in the NYTimes Corpus have hand-written summaries, and more than 1.5 million of them are tagged with people, places, and organizations mentioned in the article. The Times encourages use of the metadata for all kinds of things, and has set up a forum to discuss related research.”
The blog continues with, “We recently used this corpus to study a topic called “entity salience”. To understand salience, consider: how do you know what a news article or a web page is about? Reading comes pretty easily to people — we can quickly identify the places or things or people most central to a piece of text. But how might we teach a machine to perform this same task? This problem is a key step towards being able to read and understand an article. One way to approach the problem is to look for words that appear more often than their ordinary rates.”
Photo credit : Eric Franzon
These vistas will be explored in a session hosted by Kevin Ford, digital project coordinator at the Library of Congress at next week’s Semantic Technology & Business conference in San Jose. The door is being opened by the Bibliographic Framework Initiative (BIBFRAME) that the LOC launched a few years ago. Libraries will be moving from the MARC standards, their lingua franca for representing and communicating bibliographic and related information in machine-readable form, to BIBFRAME, which models bibliographic data in RDF using semantic technologies.
If you’re interested in Linked Data, no doubt you’re planning to listen in on next week’s Semantic Web Blog webinar, Getting Started With The Linked Data Platform (register here), featuring Arnaud Le Hors, Linked Data Standards Lead at IBM and chair of the W3C Linked Data Platform WG and the OASIS OSLC Core TC. It also may be on your agenda to attend this month’s Semantic Web Technology & Business Conference, where speakers including Le Hors, Manu Sporny, Sandro Hawke, and others will be presenting Linked Data-focused sessions.
In the meantime, though, you might enjoy reviewing the results of the LOD2 Project, the European Commission co-funded effort whose four-year run, begun in 2010, aimed at advancing RDF data management; extracting, creating and enriching structured RDF data; interlinking data from different sources; and authoring, exploring and visualizing Linked Data. To that end, why not take a stroll through the recently released Linked Open Data – Creating Knowledge Out of Interlinked Data, edited by LOD2 Project participants Soren Auer of the Institut für Informatik III Rheinische Friedrich-Wilhelms-Universität; Volha Bryl of the University of Mannheim, and Sebastian Tramp of the University of Leipzig?
In mid-July Dataversity.net, the sister site of The Semantic Web Blog, hosted a webinar on Understanding The World of Cognitive Computing. Semantic technology naturally came up during the session, which was moderated by Steve Ardire, an advisor to cognitive computing, artificial intelligence, and machine learning startups. You can find a recording of the event here.
Here, you can find a more detailed discussion of the session at large, but below are some excerpts related to how the worlds of cognitive computing and semantic technology interact.
One of the panelists, IBM Big Data Evangelist James Kobielus, discussed his thinking around what’s missing from general discussions of cognitive computing to make it a reality. “How do we normally perceive branches of AI, and clearly the semantic web and semantic analysis related to natural language processing and so much more has been part of the discussion for a long time,” he said. When it comes to finding the sense in multi-structured – including unstructured – content that might be text, audio, images or video, “what’s absolutely essential is that as you extract the patterns you are able to tag the patterns, the data, the streams, really deepen the metadata that gets associated with that content and share that metadata downstream to all consuming applications so that they can fully interpret all that content, those objects…[in] whatever the relevant context is.”
NEXT PAGE >>