SPARQL

Retrieving and Using Taxonomy Data from DBpedia

DBpedia logo on a halloween jack-o-lanternDBpedia, as described in the recent semanticweb.com article DBpedia 2014 Announced, is “a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.” It currently has over 3 billion triples (that is, facts stored using the W3C standard RDF data model) available for use by applications, making it a cornerstone of the semantic web.

A surprising amount of this data is expressed using the SKOS vocabulary, the W3C standard model for taxonomies used by the Library of Congress, the New York Times, and many other organizations to publish their taxonomies and subject headers. (semanticweb.com has covered SKOS many times in the past.) DBpedia has data about over a million SKOS concepts, arranged hierarchically and ready for you to pull down with simple queries so that you can use them in your RDF applications to add value to your own content and other data.

Where is this taxonomy data in DBpedia?

Many people think of DBpedia as mostly storing the fielded “infobox” information that you see in the gray boxes on the right side of Wikipedia pages—for example, the names of the founders and the net income figures that you see on the right side of the Wikipedia page for IBM. If you scroll to the bottom of that page, you’ll also see the categories that have been assigned to IBM in Wikipedia such as “Companies listed on the New York Stock Exchange” and “Computer hardware companies.” The Wikipedia page for Computer hardware companies lists companies that fall into this category, as well as two other interesting sets of information: subcategories (or, in taxonomist parlance, narrower categories) such as “Computer storage companies” and “Fabless semiconductor companies,” and then, at the bottom of the page, categories that are broader than “Computer hardware companies” such as “Computer companies” and “Electronics companies.”

How does DBpedia store this categorization information? The DBpedia page for IBM shows that DBpedia includes triples saying that IBM has Dublin Core subject values such as category:Companies_listed_on_the_New_York_Stock_Exchange and category:Computer_hardware_companies. The DBpedia page for the category Computer_hardware_companies shows that is a SKOS concept with values for the two key properties of a SKOS concept: a preferred label and broader values. The category:Computer_hardware_companies concept is itself the broader value of several other concepts such as category:Fabless_semiconductor_companies. Because it’s the broader value of other concepts and has its own broader values, it can be both a parent node and a child node in a tree of taxonomic terms, so DBpedia has the data that lets you build a taxonomy hierarchy around any of its categories.

Read more

Introducing SPARQLGraph, a Platform for Querying Biological Semantic Web Databases

sgDominik Schweiger, Zlatko Trajanoski and Stephan Pabinger recently wrote, “Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way.  Results: SPARQLGraph offers an intuitive drag &drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers.” Read more

SPARQL City’s Benchmark Results Showcase New Possibilities in Enterprise Graph Analytics

Solution demonstrates 10x+ the performance while running on 100x the data

Enterprise meet Graph Analysis - SPARQLcity.comNoSQL Now 2014 & SemTechBiz 2014

San Diego – August 20, 2014 – SPARQL City, which introduced its scalable graph analytic engine to market earlier this year, today announced that it has successfully run the SP2 SPARQL benchmark on 100 times the data volume as other graph solution providers, while still delivering an order of magnitude better performance on average compared to published results.

SPARQL City ran the SP2 Benchmark against 2.5 billion triples/edges on a sixteen node cluster on Amazon EC2. Average query response time for the set of seventeen queries was about 6 seconds, with query 4, the most data intensive query involving the entire dataset taking approximately 34 seconds to run. By comparison, the best reported query 4 result by other graph solution providers has been around 15 seconds, but this is when running against 25 million triples/edges, or 1/100th of the data volume in SPARQL City’s benchmark test. This level of performance, combined with the ability to easily scale out the solution on a cluster when required, makes easy to use interactive graph analytics on very large datasets possible for the first time. Detailed benchmark results can be found on our website.

Read more

SPARQL And NoSQL: A Match On Many Levels

site-header-10th-blog-304x200Is SPARQL the SQL for NoSQL? The question will be discussed at this month’s Semantic Technology & Business Conference in San Jose by Arthur Keen, vp of solution architecture of startup SPARQL City.

It’s not the first time that the industry has considered common database query languages for NoSQL (see this story at our sister site Dataversity.net for some perspective on that). But as Keen sees it, SPARQL has the legs for the job. “What I know about SPARQL is that for every database [SQL and NoSQL alike] out there, someone has tried to put SPARQL on it,” he says, whereas other common query language efforts may be limited in database support. A factor in SPARQL’s favor is query portability across NoSQL systems. Additionally, “you can achieve much higher performance using declarative query languages like SPARQL because they specify the ‘What’ and not the ‘How’ of the query, allowing optimizers to choose the best way to implement the query,” he explains.

Read more

YarcData Software Update Points Out That The Sphere Of Semantic Influence Is Growing

YarcDataRecent updates to YarcData’s software for its Urika analytics appliance reflect the fact that the enterprise is starting to understand the impact that semantic technology has on turning Big Data into actual insights.

The latest update includes integration with more enterprise data discovery tools, including the visualization and business intelligence tools Centrifuge Visual Network Analytics and TIBCO Spotfire, as well as those based on SPARQL and RDF, JDBC, JSON, and Apache Jena. The goal is to streamline the process of getting data in and then being able to provide connectivity to the tools analysts use every day.

As customers see the value of using the appliance to gain business insight, they want to be able to more tightly integrate this technology into wider enterprise workflows and infrastructures, says Ramesh Menon, YarcData vice president, solutions. “Not only do you want data from all different enterprise sources to flow into the appliance easily, but the value of results is enhanced tremendously if the insights and the ability to use those insights are more broadly distributed inside the enterprise,” he says. “Instead of having one analyst write queries on the appliance, 200 analysts can use the appliance without necessarily knowing a lot about the underlying, or semantic, technology. They are able to use the front end or discovery tools they use on daily basis, not have to leave that interface, and still get the benefit of the Ureka appliance.”

Read more

A Look Into Learning SPARQL With Author Bob DuCharme

Cover of Learning SPARQL - Second Edition, by Bob DuCharmeThe second edition of Bob DuCharme’s Learning SPARQL debuted this summer. The Semantic Web Blog connected with DuCharme – who is director of digital media solutions at TopQuadrant, the author of other works including XML: The Annotated Specification, and also a welcome speaker both at the Semantic Technology & Business Conference and our Semantic Web Blog podcasts – to learn more about the latest version of the book.

Semantic Web Blog: In what I believe has been two years since the first edition was published, what have been the most significant changes in the ‘SPARQL space’ – or the semantic web world at large — that make this the right time for an expanded edition of Learning SPARQL?

DuCharme: The key thing is that SPARQL 1.1 is now an actual W3C Recommendation. It was great to see it so widely implemented so early in its development process, which justified the release of the book’s first edition so long before 1.1 was set in stone, but now that it’s a Recommendation we can release an edition of the book that is no longer describing a moving target. Not much in SPARQL has changed since the first edition – the VALUES keyword replaced BINDINGS, with some tweaks, and some property path syntax details changed – but it’s good to know that nothing in 1.1 can change now.

Read more

MarkLogic 7 Vision: World-Class Triple Store and World-Beating Information Store

Photo courtesy: Flickr/rvaphotodude

Last month at its MarkLogic World 2013 conference, the enterprise NoSQL database platform provider talked semantics as it related to its MarkLogic Server technology that ingests, manages and searches structured, semi-structured, and unstructured data (see our story here). The vendor late last week was scheduled to provide an early access release of MarkLogic 7, formally due by year’s end, to some dozens of initial users.

“People see a convergence of search and semantics,” Stephen Buxton, Director, Product Management, recently told The Semantic Web Blog. To that end, a lot of the vendor’s customers have deployed MarkLogic technology as well as specialized triple stores, but what they really want, he says, is an integrated approach, “a single database that does both individually and both together,” he says. “We see the future of search as semantics and the future of semantics as search, and they are very much converging.” At its recent conference, Buxton says the company demonstrated a MarkLogic app it built to function like Google’s Knowledge Graph to provide an idea of the kinds of things the enterprise might do with both search and semantics together.

Following up on the comments made by MarkLogic CEO Gary Bloom at his keynote address at the conference, Buxton explained that, “the function in MarkLogic we are working on in engineering is a way to store and manage triples in the MarkLogic database natively, right alongside structured and unstructured information – a specialized triples index so queries are very fast, and so you can do SPARQL queries in MarkLogic. So, with MarkLogic 7 we will have a world-class triple store and world-beating information store – no one else does documents, values and triples in combination the way MarkLogic 7 will.”

Read more

Helping Autism Researchers, And Others, With Some SPARQL Savvy

One in 50 American children have autism, according to the latest figures released by the Centers for Disease Control and Prevention in March. One of the winners of the YarcData Graph Analytics Challenge, announced in April, can make a difference in better understanding the causes of the disease.

Taking second place in the competition, the work of Adam Lugowski, Dr. John Gilbert, and Kevin Dewesse, of the University of California at Santa Barbara, leveraged a dataset created for the Mayo Clinic Smackdown project, that has the same structure and property types – and scale – as the medical organization’s actual Big Data sets around autism, but which uses publicly available data in place of the real thing. The team can’t use the real data because it includes private information about patients, diagnosis, prescriptions, and the like.

But the actual data deployed for the project doesn’t matter, says Lugowski . “The goal is to find relationships we have never thought of before, and this way it doesn’t prejudice the algorithm,” he says. Using YarcData’s uRIKA graph analytics appliance, the algorithm queries the Smackdown dataset – which in its smallest version has almost 40 million RDF triples and in its largest is about 100 times bigger, mirroring the size of all the Mayo Clinic’s actual autism data – to discover commonalities among the data, mimicking how the real data sets could be queried in search of common precursors among clusters of patients with the diagnosis.

Read more

Eleven SPARQL 1.1 Specifications are W3C Recommendations

SPARQL LogoThe W3C has announced that eleven specifications of SPARQL 1.1 have been published as recommendations. SPARQL is the Semantic Web query language.  We caught up with Lee Feigenbaum, VP Marketing & Technology at Cambridge Semantics Inc. to discuss the significance of this announcement. Feigenbaum is a SPARQL expert who currently serves as the Co-Chair of the W3C’s SPARQL Working Group, leading the design of SPARQL.

Feigenbaum says, “SPARQL 1.1 is a huge leap forward in providing a standard way to access and update Semantic Web data. By reaching W3C Recommendation status, Semantic Web developers, vendors, publishers and consumers have a stable, well-vetted, and interoperable set of standards they can rely on for the foreseeable future.”

Read more

Music To Your Ears: Seevl Takes First Step To Become Cross-Platform Music Discovery Service

Seevl, the free music discovery service that leverages semantic technology to help users conduct searches across a world of facts-in-combination to find new musical experiences and artist information, has launched an app for Deezer that will formally go live Monday.  (See our in-depth look at Seevl here, and a screencast of how the service works here.) Deezer is a music streaming service available in more than 150 countries – not the U.S. yet, though – that claims more than 20 million users.

Seevl, which late last year updated its YouTube plug-in with more music discovery features and better integration with the YouTube user interface, models its data in RDF. In a blog post earlier this year, founder and CEO Alexandre Passant explained how the Seevl service uses Redis for simple key-value queries and SPARQL for some more complex operations, like recommendations or social network analysis, as well as provenance. As for the new Deezer app, it provides the same features as the YouTube app for easily navigating and discovering music among millions of tracks, Passant tells the Semantic Web Blog.

Read more

NEXT PAGE >>