The Journal of Medical Semantics has published an article by Alejandro Rodriguez Gonzalez, Alison Callahan, Jose Cruz-Toledo, et al, entitled “Automatically exposing OpenLifeData via SADI semantic Web Services.” The abstract begins, “Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions.” Read more
Posts Tagged ‘SPARQL’
The end of the month should see the release of an update to Cognitum’s Fluent Editor 2014 ontology editor, which will bring with it new capabilities to further drive its usage not only in academia but also in industrial segments such as energy, pharmaceuticals and more.
The company is including among its additions the ability to run analytical computations over ontologies with the open source language R and its Controlled Natural Language. What has been lacking when it comes to performing computations over Big Data sets, says CEO Paweł Zarzycki, is some shortcut to easily combine the semantic and numerical worlds. R is great for doing statistical analysis over huge sets of numerical data, he says, but more knowledge opens up when Cognitum’s language is leveraged for analysis in a qualitative way.
Bob DuCharme recently wrote, “The combination of microdata and schema.org seems to have hit a sweet spot that has helped both to get a lot of traction. I’ve been learning more about microdata recently, but even before I did, I found that the W3C’s Microdata to RDF Distiller written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. With major retailers such as Walmart and BestBuy making such data available on—as far as I can tell—every single product’s web page, this makes some interesting queries possible to compare prices and other information from the two vendors.” Read more
DBpedia, as described in the recent semanticweb.com article DBpedia 2014 Announced, is “a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.” It currently has over 3 billion triples (that is, facts stored using the W3C standard RDF data model) available for use by applications, making it a cornerstone of the semantic web.
A surprising amount of this data is expressed using the SKOS vocabulary, the W3C standard model for taxonomies used by the Library of Congress, the New York Times, and many other organizations to publish their taxonomies and subject headers. (semanticweb.com has covered SKOS many times in the past.) DBpedia has data about over a million SKOS concepts, arranged hierarchically and ready for you to pull down with simple queries so that you can use them in your RDF applications to add value to your own content and other data.
Where is this taxonomy data in DBpedia?
Many people think of DBpedia as mostly storing the fielded “infobox” information that you see in the gray boxes on the right side of Wikipedia pages—for example, the names of the founders and the net income figures that you see on the right side of the Wikipedia page for IBM. If you scroll to the bottom of that page, you’ll also see the categories that have been assigned to IBM in Wikipedia such as “Companies listed on the New York Stock Exchange” and “Computer hardware companies.” The Wikipedia page for Computer hardware companies lists companies that fall into this category, as well as two other interesting sets of information: subcategories (or, in taxonomist parlance, narrower categories) such as “Computer storage companies” and “Fabless semiconductor companies,” and then, at the bottom of the page, categories that are broader than “Computer hardware companies” such as “Computer companies” and “Electronics companies.”
How does DBpedia store this categorization information? The DBpedia page for IBM shows that DBpedia includes triples saying that IBM has Dublin Core subject values such as
category:Computer_hardware_companies. The DBpedia page for the category
Computer_hardware_companies shows that is a SKOS concept with values for the two key properties of a SKOS concept: a preferred label and broader values. The
category:Computer_hardware_companies concept is itself the broader value of several other concepts such as
category:Fabless_semiconductor_companies. Because it’s the broader value of other concepts and has its own broader values, it can be both a parent node and a child node in a tree of taxonomic terms, so DBpedia has the data that lets you build a taxonomy hierarchy around any of its categories.
Accessing an enterprise’s semantic knowledge base has its challenges for the business’ general population. Perhaps development teams already have integrated specific SPARQL queries inside a customer app or custom dashboard or otherwise accommodated some very task-oriented activities and searches, but that has its limits for non-technical users who want to explore outside the box. All the same, these users aren’t likely to write new SPARQL queries on their own — but nor do they necessarily want to wait for their IT departments to pull that together for them. Interactive query builders are options but some may find these still a little too-technically oriented.
This is a problem that Metreeca is looking to solve with Graph Rover, a self-service search and analysis tool that enables non technical users to interact visually with semantic knowledge bases. It has just released the latest beta update of the product, which lets users build queries using a graphical interface, but Graph Rover has been in development for two years while the company was in stealth mode, and tech lead Alessandro Bollini says it is already a mature solution that should be available commercially in the first quarter of 2015.
At the IESD14 (Intelligent Exploration of Semantic Data) challenge at this week’s ISWC 2014, the award was handed out to LEAPS: A Semantic Web and Linked Data Framework for the Algal Biomass Domain. The application is the work of Monika Solanki, while at the Knowledge-Based Engineering Lab at Birmingham City University in the UK.
The motivation, according to slides about the project, relates to the idea that algae biomass-based biofuels could serve as a naturally viable and sustainable energy source alternative to fossil fuels. While many companies, governments and non-profit agencies have been researching the idea, knowledge gathered exists in diverse formats and proprietary databases. What’s lacking has been a knowledge level infrastructure that is equipped with the capabilities to provide semantic grounding to the datasets for algal biomass, the slides note.
The Pacific Northwest National Laboratory recently reported on Phys.org, “As computing tools and expertise used in conducting scientific research continue to expand, so have the enormity and diversity of the data being collected. Developed at Pacific Northwest National Laboratory, the Graph Engine for Multithreaded Systems, or GEMS, is a multilayer software system for semantic graph databases. In their work, scientists from PNNL and NVIDIA Research examined how GEMS answered queries on science metadata and compared its scaling performance against generated benchmark data sets. They showed that GEMS could answer queries over science metadata in seconds and scaled well to larger quantities of data.” Read more
Symplectic Takes Another Step In Helping Universities Engage In Research Collaboration And Discovery
This summer, Symplectic Limited become the first DuraSpace Registered Service Provider (RSP) for the VIVO Project, an open-source, open-ontology, open-process platform for hosting semantically structured information about the interests, activities and accomplishments of scientists and scholars. (See our coverage here.) “Universities want to capture all that their researchers do, collaborate and reuse the data the research brings out,” says Sabih Ali, head of brand at Symplectic. “A lot of them are looking to be a part of something like VIVO and join the whole semantic web technology movement, but they don’t have the capacity to do it themselves.”
Symplectic brings that to the table with its role as a services provider and the expertise in data quality, organization and transfer that it has thanks to being a developer of Elements, software that captures, collects and showcases institutional research, and which is used by many leading universities including Cambridge and Oxford. It also offers an open-source VIVO harvester for clients allows the ingestion of information into VIVO profiles using the rich data that Elements captures.
More recently, Symplectic has taken on the role of authorized services provider for Profiles Research Networking Software, as well. Profiles RNS is an NIH-funded open source tool to speed the process of finding researchers with specific areas of expertise for collaboration and professional networking. It’s based on the VIVO 1.4 ontology, with support for RDF, SPARQL, and Linked Open Data.
A Drupal ++ platform for semantic web biomedical data – that’s how Sudeshna Das describes eXframe, a reusable framework for creating online repositories of genomics experiments. Das – who among other titles is affiliate faculty of the Harvard Stem Cell Institute – is one of the developers of eXframe, which leverages Stéphane Corlosquet’s RDF module for Drupal to produce, index (into an RDF store powered by the ARC2 PHP library) and publish semantic web data in the second generation version of the platform.
“We used the RDF modules to turn eXframe into a semantic web platform,” says Das. “That was key for us because it hid all the complexities of semantic technology.”
One instance of the platform today can be found in the repository for stem cell data as part of the Stem Cell Commons, the Harvard Stem Cell Institute’s community for stem cell bioinformatics. But Das notes the importance of the reusability aspect of the software platform to build genomics repositories that automatically produce Linked Data as well as a SPARQL endpoint, is that it becomes easy to build new repository instances with much less effort. Working off Drupal as its base, eXframe has been customized to support biomedical data and to integrate biomedical ontologies and knowledge bases.
Elevada is looking for a software engineer. The job description states: “Elevada is a data management company seeking a skilled Software Engineer with 4+ years of professional development experience. This is an opportunity to get in early (employee number < 5) at a real company with a strong product vision + enterprise customers, real revenue, and a strong sales pipeline. Compensation will be a mix of cash and equity at the end of a trial contract. Below are parameters for the position. We will tailor responsibilities to suit the individual who best fits our culture and goals. Candidate responsibilities:
- Server-side development using Java, Spring Framework, JPA, Hibernate.
- Help maintain development and deployment infrastructure in Linux environments.”
Skills requirements include:
NEXT PAGE >>