Posts Tagged ‘DBpedia’

Retrieving and Using Taxonomy Data from DBpedia

DBpedia logo on a halloween jack-o-lanternDBpedia, as described in the recent semanticweb.com article DBpedia 2014 Announced, is “a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.” It currently has over 3 billion triples (that is, facts stored using the W3C standard RDF data model) available for use by applications, making it a cornerstone of the semantic web.

A surprising amount of this data is expressed using the SKOS vocabulary, the W3C standard model for taxonomies used by the Library of Congress, the New York Times, and many other organizations to publish their taxonomies and subject headers. (semanticweb.com has covered SKOS many times in the past.) DBpedia has data about over a million SKOS concepts, arranged hierarchically and ready for you to pull down with simple queries so that you can use them in your RDF applications to add value to your own content and other data.

Where is this taxonomy data in DBpedia?

Many people think of DBpedia as mostly storing the fielded “infobox” information that you see in the gray boxes on the right side of Wikipedia pages—for example, the names of the founders and the net income figures that you see on the right side of the Wikipedia page for IBM. If you scroll to the bottom of that page, you’ll also see the categories that have been assigned to IBM in Wikipedia such as “Companies listed on the New York Stock Exchange” and “Computer hardware companies.” The Wikipedia page for Computer hardware companies lists companies that fall into this category, as well as two other interesting sets of information: subcategories (or, in taxonomist parlance, narrower categories) such as “Computer storage companies” and “Fabless semiconductor companies,” and then, at the bottom of the page, categories that are broader than “Computer hardware companies” such as “Computer companies” and “Electronics companies.”

How does DBpedia store this categorization information? The DBpedia page for IBM shows that DBpedia includes triples saying that IBM has Dublin Core subject values such as category:Companies_listed_on_the_New_York_Stock_Exchange and category:Computer_hardware_companies. The DBpedia page for the category Computer_hardware_companies shows that is a SKOS concept with values for the two key properties of a SKOS concept: a preferred label and broader values. The category:Computer_hardware_companies concept is itself the broader value of several other concepts such as category:Fabless_semiconductor_companies. Because it’s the broader value of other concepts and has its own broader values, it can be both a parent node and a child node in a tree of taxonomic terms, so DBpedia has the data that lets you build a taxonomy hierarchy around any of its categories.

Read more

AlchemyAPI’s New Face Detection And Recognition API Boosts Entity Information Courtesy Of Its Knowledge Graph

AlcaclhinfohemyAPI has released its AlchemyVision Face Detection/Recognition API, which, in response to an image file or URI, returns the position, age, gender, and, in the case of celebrities, the identities of the people in the photo and connections to their web sites, DBpedia links and more.

According to founder and CEO Elliot Turner, it’s taking a different direction than Google and Baidu with its visual recognition technology. Those two vendors, he says in an email response to questions from The Semantic Web Blog, “use their visual recognition technology internally for their own competitive advantage.  We are democratizing these technologies by providing them as an API and sharing them with the world’s software developers.”

The business case for those developers to leverage the Face Detection/Recognition API include that companies can use facial recognition for demographic profiling purposes, allowing them to understand age and gender characteristics of their audience based on profile images and sharing activity, Turner says.

Read more

DBpedia 2014 Announced

DBpedia logoProfessor Dr. Christian Bizer of the University of Mannheim, Germany, has announced the release of DBpedia 2014. DBpedia is described at dbpedia.org as  “… a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.”

The full announcement on the new release is reprinted below with Bizer’s permission.

****************

DBpedia Version 2014 released

1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.

The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases. Read more

Step Right Up To Contribute To The Web of Meaning

Photo Coutesy: Flickr/Charlotte L

Photo Coutesy: Flickr/Charlotte L

Are you looking for opportunities to contribute to the web of meaning that are appropriate to filling some hours in these last lazy days of summer? Something a little less taxing than, say, creating and publishing a Linked Data set on the web?

They’re out there. Here are a few to keep you engaged while you’re soaking up the sun, hopefully on some tropical island with a warm breeze blowing and a cool drink in hand. For those of you at this week’s Semantic Web Technology and Business conference, don’t worry – these should still be waiting for your input when you get back.

Read more

Studio Ousia Envisions A World Of Semantic Augmented Reality

Image courtesy: Flickr/by Filter Forge

Image courtesy: Flickr/by Filter Forge

Ikuya Yamada, co-founder and CTO of Studio Ousia, the company behind Linkify – the technology to automatically extract certain keywords and add intelligent hyperlinks to them to accelerate mobile search – recently sat down with The Semantic Web Blog to discuss the company’s work, including its vision of Semantic AR (augmented reality).

The Semantic Web Blog: You spoke at last year’s SEEDS Conference on the subject of linking things and information and the vision of Semantic AR, which includes the idea of delivering additional information to users before they even launch a search for it. Explain your technology’s relation to that vision of finding and delivering the information users need while they are consuming content – even just looking at a word.

Yamada: The main focus of our technology is extracting accurately only a small amount of interesting keywords from text [around people, places, or things]. …We also develop a content matching system that matches those keywords with other content on the web – like a singer [keyword] with a song or a location [keyword] with a map. By combining keyword extraction and the content matching engine, we can augment text using information on the web.

Read more

The Web Is 25 — And The Semantic Web Has Been An Important Part Of It

web25NOTE: This post was updated at 5:40pm ET.

Today the Web celebrates its 25th birthday, and we celebrate the Semantic Web’s role in that milestone. And what a milestone it is: As of this month, the Indexed Web contains at least 2.31 billion pages, according to WorldWideWebSize.  

The Semantic Web Blog reached out to the World Wide Web Consortium’s current and former semantic leads to get their perspective on the roads The Semantic Web has traveled and the value it has so far brought to the Web’s table: Phil Archer, W3C Data Activity Lead coordinating work on the Semantic Web and related technologies; Ivan Herman, who last year transitioned roles at the W3C from Semantic Activity Lead to Digital Publishing Activity Lead; and Eric Miller, co-founder and president of Zepheira and the leader of the Semantic Web Initiative at the W3C until 2007.

While The Semantic Web came to the attention of the wider public in 2001, with the publication in The Scientific American of The Semantic Web by Tim Berners-Lee, James Hendler and Ora Lassila, Archer points out that “one could argue that the Semantic Web is 25 years old,” too. He cites Berners-Lee’s March 1989 paper, Information Management: A Proposal, that includes a diagram that shows relationships that are immediately recognizable as triples. “That’s how Tim envisaged it from Day 1,” Archer says.

Read more

Dandelion’s New Bloom: A Family Of Semantic Text Analysis APIs

rsz_dandyDandelion, the service from SpazioDati whose goal is to delivering linked and enriched data for apps, has just recently introduced a new suite of products related to semantic text analysis.

Its dataTXT family of semantic text analysis APIs includes dataTXT-NEX, a named entity recognition API that links entities in the input sentence with Wikipedia and DBpedia and, in turn, with the Linked Open Data cloud and dataTXT-SIM, an experimental semantic similarity API that computes the semantic distance between two short sentences. TXT-CL (now in beta) is a categorization service that classifies short sentences into user-defined categories, says SpazioDati.CEO Michele Barbera.

“The advantage of the dataTXT family compared to existing text analysis’ tools is that dataTXT relies neither on machine learning nor NLP techniques,” says Barbera. “Rather it relies entirely on the topology of our underlying knowledge graph to analyze the text.” Dandelion’s knowledge graph merges together several Open Community Data sources (such as DBpedia) and private data collected and curated by SpazioDati. It’s still in private beta and not yet publicly accessible, though plans are to gradually open up portions of the graph in the future via the service’s upcoming Datagem APIs, “so that developers will be able to access the same underlying structured data by linking their own content with dataTXT APIs or by directly querying the graph with the Datagem APIs; both of them will return the same resource identifiers,” Barbera says. (See the Semantic Web Blog’s initial coverage of Dandelion here, including additional discussion of its knowledge graph.)

Read more

Music Discovery Service seevl.fm Launches

screen shot of seevl.fm search: Lou ReedThis week marked the public launch of seevl.fm.

SemanticWeb.com has tracked seevl’s development through various incarnations, including a YouTube plugin and as a service for users of Deezer (available as a Deezer app). This week’s development, however, sees the service emerge as a stand-alone, cross-browser, cross-platform, mobile-ready service; a service that is free and allows for unlimited search and discovery. So, what can one do with seevl?

Following the death of Lou Reed this week, I (not surprisingly) saw mentions of the artist skyrocket across my social networks. People were sharing memories and seeking information — album and song titles, lyrics, biographies, who influenced Reed, who Reed influenced, and a lot of people simply wanted to listen to Reed’s music.  A quick look at the seevl.fm listing for Lou Reed shows a wealth of information including a music player pre-populated with some of the artist’s greatest hits.

Read more

Picture This: Muséophile Leads Art Lovers To Art Works

rsz_museophilepixWhere in Paris might you find an exhibit featuring the artist Charles Le Brun?

If you didn’t know to check into the Musée du Louvre, Muséophile can help. Just plug in an artist or movement, and city, and see what comes to the fore. The service, still in beta, comes courtesy of Sémanticpédia, a platform for collaboration between France’s Ministry of Culture and Communication; INRIA, the country’s public institution of scientific and technological research;   and Wikimedia France, whose aim is to perform research and development applied to corpus or collaborative cultural projects, using data extracted from Wikimedia projects.

Currently, data from French DBpedia is available for Sémanticpédia projects to leverage, and Muséophile is the first effort of the collaboration to do so. In fact, an overall effort by the government of France to boost the representation of French cultural resources on the web, which should aid in Muséophile’s continued development, is underway: Those within the Ministry of Culture and Communication with expertise in various content bases related to French culture are being charged to contribute their knowledge to the still youthful French DBpedia, which constitutes an extraction of structured information from the French Wikipedia. That will lead to a single reference system that they also can rely on to collaborate, and exchange and integrate data.

Read more

DBpedia 3.9 Hits The Runway


rsz_dbnew3DBpedia 3.9
is up and going. Word came today from Christian Bizer and Christopher Sahnwaldt that the new release boasts an overall increase in the number of concepts in the English edition from 3.7 to 4 million things, thanks to being based on updated Wikipedia dumps from the spring of 2013.
Other numbers to impress:

Read more

NEXT PAGE >>