Dominik Schweiger, Zlatko Trajanoski and Stephan Pabinger recently wrote, “Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. Results: SPARQLGraph offers an intuitive drag &drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers.” Read more
Solution demonstrates 10x+ the performance while running on 100x the data
San Diego – August 20, 2014 – SPARQL City, which introduced its scalable graph analytic engine to market earlier this year, today announced that it has successfully run the SP2 SPARQL benchmark on 100 times the data volume as other graph solution providers, while still delivering an order of magnitude better performance on average compared to published results.
SPARQL City ran the SP2 Benchmark against 2.5 billion triples/edges on a sixteen node cluster on Amazon EC2. Average query response time for the set of seventeen queries was about 6 seconds, with query 4, the most data intensive query involving the entire dataset taking approximately 34 seconds to run. By comparison, the best reported query 4 result by other graph solution providers has been around 15 seconds, but this is when running against 25 million triples/edges, or 1/100th of the data volume in SPARQL City’s benchmark test. This level of performance, combined with the ability to easily scale out the solution on a cluster when required, makes easy to use interactive graph analytics on very large datasets possible for the first time. Detailed benchmark results can be found on our website.
Is SPARQL the SQL for NoSQL? The question will be discussed at this month’s Semantic Technology & Business Conference in San Jose by Arthur Keen, vp of solution architecture of startup SPARQL City.
It’s not the first time that the industry has considered common database query languages for NoSQL (see this story at our sister site Dataversity.net for some perspective on that). But as Keen sees it, SPARQL has the legs for the job. “What I know about SPARQL is that for every database [SQL and NoSQL alike] out there, someone has tried to put SPARQL on it,” he says, whereas other common query language efforts may be limited in database support. A factor in SPARQL’s favor is query portability across NoSQL systems. Additionally, “you can achieve much higher performance using declarative query languages like SPARQL because they specify the ‘What’ and not the ‘How’ of the query, allowing optimizers to choose the best way to implement the query,” he explains.
Recent updates to YarcData’s software for its Urika analytics appliance reflect the fact that the enterprise is starting to understand the impact that semantic technology has on turning Big Data into actual insights.
The latest update includes integration with more enterprise data discovery tools, including the visualization and business intelligence tools Centrifuge Visual Network Analytics and TIBCO Spotfire, as well as those based on SPARQL and RDF, JDBC, JSON, and Apache Jena. The goal is to streamline the process of getting data in and then being able to provide connectivity to the tools analysts use every day.
As customers see the value of using the appliance to gain business insight, they want to be able to more tightly integrate this technology into wider enterprise workflows and infrastructures, says Ramesh Menon, YarcData vice president, solutions. “Not only do you want data from all different enterprise sources to flow into the appliance easily, but the value of results is enhanced tremendously if the insights and the ability to use those insights are more broadly distributed inside the enterprise,” he says. “Instead of having one analyst write queries on the appliance, 200 analysts can use the appliance without necessarily knowing a lot about the underlying, or semantic, technology. They are able to use the front end or discovery tools they use on daily basis, not have to leave that interface, and still get the benefit of the Ureka appliance.”
Last month at its MarkLogic World 2013 conference, the enterprise NoSQL database platform provider talked semantics as it related to its MarkLogic Server technology that ingests, manages and searches structured, semi-structured, and unstructured data (see our story here). The vendor late last week was scheduled to provide an early access release of MarkLogic 7, formally due by year’s end, to some dozens of initial users.
“People see a convergence of search and semantics,” Stephen Buxton, Director, Product Management, recently told The Semantic Web Blog. To that end, a lot of the vendor’s customers have deployed MarkLogic technology as well as specialized triple stores, but what they really want, he says, is an integrated approach, “a single database that does both individually and both together,” he says. “We see the future of search as semantics and the future of semantics as search, and they are very much converging.” At its recent conference, Buxton says the company demonstrated a MarkLogic app it built to function like Google’s Knowledge Graph to provide an idea of the kinds of things the enterprise might do with both search and semantics together.
Following up on the comments made by MarkLogic CEO Gary Bloom at his keynote address at the conference, Buxton explained that, “the function in MarkLogic we are working on in engineering is a way to store and manage triples in the MarkLogic database natively, right alongside structured and unstructured information – a specialized triples index so queries are very fast, and so you can do SPARQL queries in MarkLogic. So, with MarkLogic 7 we will have a world-class triple store and world-beating information store – no one else does documents, values and triples in combination the way MarkLogic 7 will.”
One in 50 American children have autism, according to the latest figures released by the Centers for Disease Control and Prevention in March. One of the winners of the YarcData Graph Analytics Challenge, announced in April, can make a difference in better understanding the causes of the disease.
Taking second place in the competition, the work of Adam Lugowski, Dr. John Gilbert, and Kevin Dewesse, of the University of California at Santa Barbara, leveraged a dataset created for the Mayo Clinic Smackdown project, that has the same structure and property types – and scale – as the medical organization’s actual Big Data sets around autism, but which uses publicly available data in place of the real thing. The team can’t use the real data because it includes private information about patients, diagnosis, prescriptions, and the like.
But the actual data deployed for the project doesn’t matter, says Lugowski . “The goal is to find relationships we have never thought of before, and this way it doesn’t prejudice the algorithm,” he says. Using YarcData’s uRIKA graph analytics appliance, the algorithm queries the Smackdown dataset – which in its smallest version has almost 40 million RDF triples and in its largest is about 100 times bigger, mirroring the size of all the Mayo Clinic’s actual autism data – to discover commonalities among the data, mimicking how the real data sets could be queried in search of common precursors among clusters of patients with the diagnosis.
The W3C has announced that eleven specifications of SPARQL 1.1 have been published as recommendations. SPARQL is the Semantic Web query language. We caught up with Lee Feigenbaum, VP Marketing & Technology at Cambridge Semantics Inc. to discuss the significance of this announcement. Feigenbaum is a SPARQL expert who currently serves as the Co-Chair of the W3C’s SPARQL Working Group, leading the design of SPARQL.
Feigenbaum says, “SPARQL 1.1 is a huge leap forward in providing a standard way to access and update Semantic Web data. By reaching W3C Recommendation status, Semantic Web developers, vendors, publishers and consumers have a stable, well-vetted, and interoperable set of standards they can rely on for the foreseeable future.”
Seevl, the free music discovery service that leverages semantic technology to help users conduct searches across a world of facts-in-combination to find new musical experiences and artist information, has launched an app for Deezer that will formally go live Monday. (See our in-depth look at Seevl here, and a screencast of how the service works here.) Deezer is a music streaming service available in more than 150 countries – not the U.S. yet, though – that claims more than 20 million users.
Seevl, which late last year updated its YouTube plug-in with more music discovery features and better integration with the YouTube user interface, models its data in RDF. In a blog post earlier this year, founder and CEO Alexandre Passant explained how the Seevl service uses Redis for simple key-value queries and SPARQL for some more complex operations, like recommendations or social network analysis, as well as provenance. As for the new Deezer app, it provides the same features as the YouTube app for easily navigating and discovering music among millions of tracks, Passant tells the Semantic Web Blog.
When the Nobel Prize winners for 2013 are announced in the fall, perhaps there also will be some challenges issued to the worldwide community of data enthusiasts to see what they can do with open Linked Data about the prizes that have been awarded since the beginning of the 20th century.
Right now that’s just on the wish lists of Matthias Palmér and Hannes Ebner, co-founders of MetaSolutions AB, a spin-off from the Royal Institute of Technology in Stockholm and Uppsala University focused on semantic and scalable web apps. But a solid start has been made through their work with Nobel Media AB, which develops and manages programs, productions and media rights of the Nobel Prize within the areas of digital and broadcast media, including the Nobelprize.org domain, on the Nobel Prize Linked Data set.
NEXT PAGE >>