Bob DuCharme recently wrote, “The combination of microdata and schema.org seems to have hit a sweet spot that has helped both to get a lot of traction. I’ve been learning more about microdata recently, but even before I did, I found that the W3C’s Microdata to RDF Distiller written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. With major retailers such as Walmart and BestBuy making such data available on—as far as I can tell—every single product’s web page, this makes some interesting queries possible to compare prices and other information from the two vendors.” Read more
Posts Tagged ‘Bob DuCharme’
DBpedia, as described in the recent semanticweb.com article DBpedia 2014 Announced, is “a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.” It currently has over 3 billion triples (that is, facts stored using the W3C standard RDF data model) available for use by applications, making it a cornerstone of the semantic web.
A surprising amount of this data is expressed using the SKOS vocabulary, the W3C standard model for taxonomies used by the Library of Congress, the New York Times, and many other organizations to publish their taxonomies and subject headers. (semanticweb.com has covered SKOS many times in the past.) DBpedia has data about over a million SKOS concepts, arranged hierarchically and ready for you to pull down with simple queries so that you can use them in your RDF applications to add value to your own content and other data.
Where is this taxonomy data in DBpedia?
Many people think of DBpedia as mostly storing the fielded “infobox” information that you see in the gray boxes on the right side of Wikipedia pages—for example, the names of the founders and the net income figures that you see on the right side of the Wikipedia page for IBM. If you scroll to the bottom of that page, you’ll also see the categories that have been assigned to IBM in Wikipedia such as “Companies listed on the New York Stock Exchange” and “Computer hardware companies.” The Wikipedia page for Computer hardware companies lists companies that fall into this category, as well as two other interesting sets of information: subcategories (or, in taxonomist parlance, narrower categories) such as “Computer storage companies” and “Fabless semiconductor companies,” and then, at the bottom of the page, categories that are broader than “Computer hardware companies” such as “Computer companies” and “Electronics companies.”
How does DBpedia store this categorization information? The DBpedia page for IBM shows that DBpedia includes triples saying that IBM has Dublin Core subject values such as
category:Computer_hardware_companies. The DBpedia page for the category
Computer_hardware_companies shows that is a SKOS concept with values for the two key properties of a SKOS concept: a preferred label and broader values. The
category:Computer_hardware_companies concept is itself the broader value of several other concepts such as
category:Fabless_semiconductor_companies. Because it’s the broader value of other concepts and has its own broader values, it can be both a parent node and a child node in a tree of taxonomic terms, so DBpedia has the data that lets you build a taxonomy hierarchy around any of its categories.
As we prepare to greet the New Year, we take a look back at the year that was. Some of the leading voices in the semantic web/Linked Data/Web 3.0 and sentiment analytics space give us their thoughts on the highlights of 2013.
Phil Archer, Data Activity Lead, W3C:
The completion and rapid adoption of the updated SPARQL specs, the use of Linked Data (LD) in life sciences, the adoption of LD by the European Commission, and governments in the UK, The Netherlands (NL) and more [stand out]. In other words, [we are seeing] the maturation and growing acknowledgement of the advantages of the technologies.
I contributed to a recent study into the use of Linked Data within governments. We spoke to various UK government departments as well as the UN FAO, the German National Library and more. The roadblocks and enablers section of the study (see here) is useful IMO.
Bottom line: Those organisations use LD because it suits them. It makes their own tasks easier, it allows them to fulfill their public tasks more effectively. They don’t do it to be cool, and they don’t do it to provide 5-Star Linked Data to others. They do it for hard headed and self-interested reasons.
Christine Connors, founder and information strategist, TriviumRLG:
What sticks out in my mind is the resource market: We’ve seen more “semantic technology” job postings, academic positions and M&A activity than I can remember in a long time. I think that this is a noteworthy trend if my assessment is accurate.
There’s also been a huge increase in the attentions of the librarian community, thanks to long-time work at the Library of Congress, from leading experts in that field and via schema.org.
Bob DuCharme recently wrote on his blog, “I think I’ve figured it out… Here’s how to sell the Semantic Web and Linked Data visions to the Big Data folk: don’t. Sell them on RDF technology. The process of selling a set of technologies usually means selling a vision, getting people psyched about that vision, and then telling them about the technology that implements that vision. For RDF technology (by which I mean RDF, SPARQL, and optionally, RDFS and OWL), the vision for many years was the Semantic Web. Some people in that community eventually decided that an easier vision to sell was Linked Data. (Linked Data may not always include RDF technology—when Tim Berners-Lee added ‘(RDF*, SPARQL)’ to his list of Linked Data principles, it became the filioque controversy of the Linked Data community—but the boundaries of this or other sets of technologies I’m discussing are not the issue here. The point is, it’s very common to use the Linked Data vision to sell people on the value of using URIs, triples, and SPARQL together.)” Read more
Bob DuCharme has shared some interesting insights regarding SPARQL, RDF, and Big Data. He writes, “I think it’s obvious that SPARQL and other RDF-related technologies have plenty to offer to the overlapping worlds of Big Data and NoSQL, but this doesn’t seem as obvious to people who focus on those areas. For example, the program for this week’s Strata conference makes no mention of RDF or SPARQL. The more I look into it, the more I see that this flexible, standardized data model and query language align very well with what many of those people are trying to do.” Read more
Bob DuCharme, author and speaker, has provided an excellent example of one of the benefits RDF has over XML. In his example, DuCharme shows how to perform a simple federated query with RDF across two different address books. He writes, “Once, at an XML Summer School session, I was giving a talk about semantic web technology to a group that included several presenters from other sessions. This included Henry Thompson, who I’ve known since the SGML days. He was still a bit skeptical about RDF, and said that RDF was in the same situation as XML—that if he and I stored similar information using different vocabularies, we’d still have to convert his to use the same vocabulary as mine or vice versa before we could use our data together.” Read more
Also, don’t forget to listen to our podcast here for more insights into what 2012 may hold.
- Interest in sentiment analysis exploded with the growth of the social Web, although its reputation suffered due to the prevalence of low-grade Twitter-sentiment toys, simplistic, wildly inaccurate systems that misled many into criticism of the concept where it was the cheap implementations they’d tried that were faulty. In 2012, sentiment analysis will come into its own: Automated (and crowd-sourced!) mining of attitudes, opinions, emotions, and intent from social and enterprise sources, at the “feature” level, linked to real-world profiles and transactional data. — Seth Grimes, founder, Alta Plana Corp
To accompany our recent podcast looking back on 2011, we’ve accumulated some additional perspectives from thought leaders in the next-wave Web space on the year that’s quickly passing us by.
Some highlights follow. You’ll see respondents hit on some common themes throughout, such as Big Data, sentiment analytics, specific vertical industry adoption, and the standards space:
- SKOS has become an increasingly popular entry point for organizations that want to use semantic technology in practical applications without worrying about the more complicated aspects of semantic web technology. – Bob DuCharme, solutions architect, TopQuadrant
It’s that spooky time of year again – in your neighborhood and on the Semantic Web, too. Put on your goblin getups, and see how some semantic webbers and related sites are getting Halloween treats into their mix:
- We’ll start with a response we got to a query we posed about how you might have some fun with Halloween-oriented SPARQL queries. From Bob DuCharme, solutions architect at TopQuadrant, comes a query to extract a SKOS taxonomy of horror movies from DBpedia.
NEXT PAGE >>