Technologies

Retrieving and Using Taxonomy Data from DBpedia

DBpedia logo on a halloween jack-o-lanternDBpedia, as described in the recent semanticweb.com article DBpedia 2014 Announced, is “a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.” It currently has over 3 billion triples (that is, facts stored using the W3C standard RDF data model) available for use by applications, making it a cornerstone of the semantic web.

A surprising amount of this data is expressed using the SKOS vocabulary, the W3C standard model for taxonomies used by the Library of Congress, the New York Times, and many other organizations to publish their taxonomies and subject headers. (semanticweb.com has covered SKOS many times in the past.) DBpedia has data about over a million SKOS concepts, arranged hierarchically and ready for you to pull down with simple queries so that you can use them in your RDF applications to add value to your own content and other data.

Where is this taxonomy data in DBpedia?

Many people think of DBpedia as mostly storing the fielded “infobox” information that you see in the gray boxes on the right side of Wikipedia pages—for example, the names of the founders and the net income figures that you see on the right side of the Wikipedia page for IBM. If you scroll to the bottom of that page, you’ll also see the categories that have been assigned to IBM in Wikipedia such as “Companies listed on the New York Stock Exchange” and “Computer hardware companies.” The Wikipedia page for Computer hardware companies lists companies that fall into this category, as well as two other interesting sets of information: subcategories (or, in taxonomist parlance, narrower categories) such as “Computer storage companies” and “Fabless semiconductor companies,” and then, at the bottom of the page, categories that are broader than “Computer hardware companies” such as “Computer companies” and “Electronics companies.”

How does DBpedia store this categorization information? The DBpedia page for IBM shows that DBpedia includes triples saying that IBM has Dublin Core subject values such as category:Companies_listed_on_the_New_York_Stock_Exchange and category:Computer_hardware_companies. The DBpedia page for the category Computer_hardware_companies shows that is a SKOS concept with values for the two key properties of a SKOS concept: a preferred label and broader values. The category:Computer_hardware_companies concept is itself the broader value of several other concepts such as category:Fabless_semiconductor_companies. Because it’s the broader value of other concepts and has its own broader values, it can be both a parent node and a child node in a tree of taxonomic terms, so DBpedia has the data that lets you build a taxonomy hierarchy around any of its categories.

Read more

NEW WEBINAR Announced: Yosemite Project – Part 3

“Transformations for Integrating VA data with FHIR in RDF”

Yosemite Project Part 3: Part 3-Transformations for Integrating VA data with FHIR in RDF SemanticWeb.com recently launched a series of webinars on the topic of “RDF as a Universal Healthcare Exchange Language.”

Part 1 of that series, “The Yosemite Project: An RDF Roadmap for Healthcare Information Interoperability,” is available as a recorded webinar and slide deck.

Part 2,The Ideal Medium for Health Data? A Dive into Lab Tests,” will take place on November 7, 2014 (registration is open as of this writing).

Announcing Part 3:

click here to register now!
TITLE: Transformations for Integrating VA data with FHIR in RDF
DATE: Wednesday, November 12, 2014
TIME: 2 PM Eastern / 11 AM Pacific
PRICE: Free to all attendees
DESCRIPTION: In our series on The Yosemite Project, we explore RDF as a data standard for health data. In this installment, we will hear from Rafael Richards, Physician Informatician, Office of Informatics and Analytics in the Veterans Health Administration (VHA), about “Transformations for Integrating VA data with FHIR in RDF.”

The VistA EHR has its own data model and vocabularies for representing healthcare data. This webinar describes how SPARQL Inference Notation (SPIN) can be used to translate VistA data to the data represented used by FHIR, an emerging interchange standard.

 

Read more

NEW WEBINAR Announced: Yosemite Project – Part 2

“The Ideal Medium for Health Data? A Dive into Lab Tests”

Yosemite Project webinar part 2: The Ideal Medium for Health DataSemanticWeb.com recently launched a series of webinars on the topic of “RDF as a Universal Healthcare Exchange Language.” Part 1 of that series, “The Yosemite Project: An RDF Roadmap for Healthcare Information Interoperability,” is available as a recorded webinar and slide deck at:
http://semanticweb.com/webinar-yosemite-project-part-1-rdf-roadmap-healthcare-information-interoperability-video_b44757

Announcing Yosemite Project – Part 2:

click here to register now!
TITLE: The Ideal Medium for Health Data? A Dive into Lab Tests
DATE: Friday, November 7, 2014
TIME: 2 PM Eastern / 11 AM Pacific
PRICE: Free to all attendees
DESCRIPTION: In our series on The Yosemite Project, we explore RDF as a data standard for health data. In this installment, we will hear from Conor Dowling, CTO of Caregraf about “The Ideal Medium for Health Data? A Dive into Lab Tests.”

Lab tests and results have many dimensions from substances measured to timing to the condition of a patient. This presentation will show how RDF is the best medium to fully capture this highly nuanced data.

 

Read more

WEBINAR: The Yosemite Project – Part 1: An RDF Roadmap for Healthcare Information Interoperability (VIDEO)

The Yosemite Project - Part 1In case you missed last Friday’s webinar, “The Yosemite Project – Part 1: An RDF Roadmap for Healthcare Information Interoperability” delivered by David Booth, the recording and slides are now available (and posted below). The webinar was co-produced by SemanticWeb.com and DATAVERSITY.net and runs for one hour, including a Q&A session with the audience that attended the live broadcast.

If you watch this webinar, please use the comments section below to share your questions, comments, and ideas for webinars you would like to see in the future.

About the Webinar

Interoperability of electronic healthcare information remains an enormous challenge in spite of 100+ available healthcare information standards. This webinar explains the Yosemite Project, whose mission is to achieve semantic interoperability of all structured healthcare information through RDF as a common semantic foundation. It explains the rationale and technical strategy of the Yosemite Project, and describes how RDF and related standards address a two-pronged strategy for semantic interoperability: facilitating collaborative standards convergence whenever possible, and crowd-sourced data translations when necessary.

Read more

Introducing GEMS, a Multilayer Software System for Graph Databases

gemsThe Pacific Northwest National Laboratory recently reported on Phys.org, “As computing tools and expertise used in conducting scientific research continue to expand, so have the enormity and diversity of the data being collected. Developed at Pacific Northwest National Laboratory, the Graph Engine for Multithreaded Systems, or GEMS, is a multilayer software system for semantic graph databases. In their work, scientists from PNNL and NVIDIA Research examined how GEMS answered queries on science metadata and compared its scaling performance against generated benchmark data sets. They showed that GEMS could answer queries over science metadata in seconds and scaled well to larger quantities of data.” Read more

GitHub Adds schema.org Actions to Email Notifications via JSON-LD

GitHub logoStéphane Corlosquet has noticed that GitHub has added schema.org Actions using the JSON-LD syntax to the notification emails that GitHub users receive.

On Twitter, Corlosquet posted:

Tweet from @scolorquet: "Looks like @github just started to use http://schema.org  actions with JSON-LD in their notifications emails! "

Read more

Semantic Interoperability of Electronic Healthcare Info On The Agenda At U.S. Veterans Health Administration

LVScreenThe Yosemite Project, unveiled at this August’s Semantic Technology & Business Conference during the second annual RDF as a Universal Healthcare Exchange Language panel, lays out a roadmap for leveraging RDF in support of making all structured healthcare information semantically interoperable. (The Semantic Web Blog’s sister publication, Dataversity.net, has an article on its site explaining the details of that roadmap.)

The Yosemite Project grew out of the Yosemite Manifesto that was announced at the 2013 SemTechBiz conference (see our story here). The goals of the Manifesto have now been mapped out into the Project’s guidelines to follow on the journey to semantic interoperability by David Booth, senior software architect at Hawaii Resource Group (who led the RDF Healthcare panels at both the 2013 and 2014 conferences). The approach taken by the Yosemite Project matches that of others in the healthcare sector who want to see semantic interoperability of electronic healthcare information.

Among them are Booth’s fellow panelists at this year’s event, including Rafael Richards. Richards, who is physician informaticist at the U.S. Veterans Health Administration – which counts 1,200 care sites in its portfolio – comments on that alignment as it relates to the work he is leading in the Linked Vitals project to integrate the VA’s VistA electronic health records system with data types conforming to Fast Healthcare Interoperability Resources, orFHIR,standard for data exchange, and with information types supporting the Logical Observation Identifiers Names and Codes, or LOINC, database that facilitates the exchange and pooling of results for clinical care, outcomes management, and research.

Read more

Deconstructing JSON-LD

Photo of (clockwise from top-left: Aaron Bradley, Greg Kellogg, Phil Archer, Stephane CorlosquetAaron Bradley recently posted a roundtable discussion about JSON-LD which includes: “JSON-LD is everywhere. Okay, perhaps not everywhere, but JSON-LD loomed large at the 2014 Semantic Web Technology and Business Conference in San Jose, where it was on many speakers’ lips, and could be seen in the code examples of many presentations. I’ve read much about the format – and have even provided a thumbnail definition of JSON-LD in these pages – but I wanted to take advantage of the conference to learn more about JSON-LD, and to better understand why this very recently-developed standard has been such a runaway hit with developers. In this quest I could not have been more fortunate than to sit down with Gregg Kellogg, one of the editors of the W3C Recommendation for JSON-LD, to learn more about the format, its promise as a developmental tool, and – particularly important to me as a search marketer – the role in the evolution of schema.org.”

Read more

Introducing SPARQLGraph, a Platform for Querying Biological Semantic Web Databases

sgDominik Schweiger, Zlatko Trajanoski and Stephan Pabinger recently wrote, “Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way.  Results: SPARQLGraph offers an intuitive drag &drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers.” Read more

SPARQL City’s Benchmark Results Showcase New Possibilities in Enterprise Graph Analytics

Solution demonstrates 10x+ the performance while running on 100x the data

Enterprise meet Graph Analysis - SPARQLcity.comNoSQL Now 2014 & SemTechBiz 2014

San Diego – August 20, 2014 – SPARQL City, which introduced its scalable graph analytic engine to market earlier this year, today announced that it has successfully run the SP2 SPARQL benchmark on 100 times the data volume as other graph solution providers, while still delivering an order of magnitude better performance on average compared to published results.

SPARQL City ran the SP2 Benchmark against 2.5 billion triples/edges on a sixteen node cluster on Amazon EC2. Average query response time for the set of seventeen queries was about 6 seconds, with query 4, the most data intensive query involving the entire dataset taking approximately 34 seconds to run. By comparison, the best reported query 4 result by other graph solution providers has been around 15 seconds, but this is when running against 25 million triples/edges, or 1/100th of the data volume in SPARQL City’s benchmark test. This level of performance, combined with the ability to easily scale out the solution on a cluster when required, makes easy to use interactive graph analytics on very large datasets possible for the first time. Detailed benchmark results can be found on our website.

Read more

NEXT PAGE >>