DBpedia, as described in the recent semanticweb.com article DBpedia 2014 Announced, is “a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.” It currently has over 3 billion triples (that is, facts stored using the W3C standard RDF data model) available for use by applications, making it a cornerstone of the semantic web.
A surprising amount of this data is expressed using the SKOS vocabulary, the W3C standard model for taxonomies used by the Library of Congress, the New York Times, and many other organizations to publish their taxonomies and subject headers. (semanticweb.com has covered SKOS many times in the past.) DBpedia has data about over a million SKOS concepts, arranged hierarchically and ready for you to pull down with simple queries so that you can use them in your RDF applications to add value to your own content and other data.
Where is this taxonomy data in DBpedia?
Many people think of DBpedia as mostly storing the fielded “infobox” information that you see in the gray boxes on the right side of Wikipedia pages—for example, the names of the founders and the net income figures that you see on the right side of the Wikipedia page for IBM. If you scroll to the bottom of that page, you’ll also see the categories that have been assigned to IBM in Wikipedia such as “Companies listed on the New York Stock Exchange” and “Computer hardware companies.” The Wikipedia page for Computer hardware companies lists companies that fall into this category, as well as two other interesting sets of information: subcategories (or, in taxonomist parlance, narrower categories) such as “Computer storage companies” and “Fabless semiconductor companies,” and then, at the bottom of the page, categories that are broader than “Computer hardware companies” such as “Computer companies” and “Electronics companies.”
How does DBpedia store this categorization information? The DBpedia page for IBM shows that DBpedia includes triples saying that IBM has Dublin Core subject values such as
category:Computer_hardware_companies. The DBpedia page for the category
Computer_hardware_companies shows that is a SKOS concept with values for the two key properties of a SKOS concept: a preferred label and broader values. The
category:Computer_hardware_companies concept is itself the broader value of several other concepts such as
category:Fabless_semiconductor_companies. Because it’s the broader value of other concepts and has its own broader values, it can be both a parent node and a child node in a tree of taxonomic terms, so DBpedia has the data that lets you build a taxonomy hierarchy around any of its categories.