The World Wide Web Consortium has headline news today: The Semantic Web, as well as eGovernment, Activities are being merged and superseded by the Data Activity, where Phil Archer serves as Lead. Two new workgroups also have been chartered: CSV on the Web and Data on the Web Best Practices.
What’s driving this? First, Archer explains, the Semantic Web technology stack is now mature, and it’s time to allow those updated standards to be used. With RDF 1.1, the Linked Data Platform, SPARQL 1.1, RDB To RDF Mapping Language (R2RML), OWL 2, and Provenance all done or very close to it, it’s the right time “to take that very successful technology stack and try to implement it in the wider environment,” Archer says, rather than continue tinkering with the standards.
The second reason, he notes, is that a large community exists “that sees Linked Data, let alone the full Semantic Web, as an unnecessarily complicated technology. To many developers, data means JSON — anything else is a problem. During the Open Data on the Web workshop held in London in April, Open Knowledge Foundation co-founder and director Rufus Pollock said that if he suggested to the developers that they learn SPARQL he’d be laughed at – and he’s not alone.” Archer says. “We need to end the religious wars, where they exist, and try to make it easier to work with data in the format that people like to work in.”
The new CSV on the Web Working Group is an important step in that direction, following on the heels of efforts such as R2RML. It’s about providing metadata about CSV files, such as column headings, data types, and annotations, and, with it, making it easily possible to convert CSV into RDF (or other formats), easing data integration. “The working group will define a metadata vocabulary and then a protocol for how to link data to metadata (presumably using HTTP Link headers) or embed the metadata directly. Since the links between data and metadata can work in either direction, the data can come from an API that returns tabular data just as easily as it can a static file,” says Archer. “It doesn’t take much imagination to string together a tool chain that allows you to run SPARQL queries against ’5 Star Data’ that’s actually published as a CSV exported from a spreadsheet.”
Archer wants it to be very clear that the W3C “is NOT closing its work on the Semantic Web. We believe that it is the way to do data at Web scale. It offers substantial benefits over alternative technologies in many scenarios. What we are trying to do is to focus on deployment and integration within the broader landscape.”
Improvements on the semantic web front that can be made will remain in the picture. For example, the RDF validation work discussed at this workshop in September potentially may give birth to a new working group next year to take up questions around how you can assess whether a dataset conforms to a given profile, using particular classes and properties in a defined way. Use cases for RDF validation include data portals and more sophisticated life science scenarios, for example, where you want to be sure that when data is ingested into a triple store it will conform to certain expectations.
There also may be opportunities to help address what is still a common complaint about Semantic Web tools not being up to production-grade. For example, efforts could potentially be directed next year to standardizing the conversion of SPARQL into XQuery to the ends of delivering a highly-resilient system able to handle traffic peaks. “That might give us a subset of SPARQL,” Archer says. “I could imagine that some SPARQL queries wouldn’t translate, but I also could imagine that there could be something that handles 80 percent of the cases to build a high-end, high-capacity SPARQL query engine.”
In The New Mix
The Data on the Web Best Practices working group, he explains, will not define any new technologies but will guide data publishers (government, research scientists, cultural heritage organizations) to better use the Web as a data platform. Additionally, the Data Activity, as well as the new Digital Publishing Activity that will be lead by former Semantic Web Activity Lead Ivan Herman, are now in a new domain called the Information and Knowledge Domain (INK), led by Ralph Swick.
“This marks a break from work on security, privacy and social which remain in the Technology & Society Domain (now lead by Wendy Seltzer),” he says. “No division is perfect and the one between INK and T&S is diaphanous, but one unnatural division is being erased altogether. I’m delighted to say that joining us in the INK Domain is the XML Activity, meaning that there is no artificial boundary between work in different data formats.”
Also on the W3C agenda now is putting more staff effort into supporting vocabulary development and management, including encouraging the use of the W3C’s Community Group system for the developing of vocabularies/ontologies in specific domains. Archer would like to see greater take-up by Community Groups around using a w3.org/ns namespace, and also wants to encourage the provision of multilingual comments, labels and usage notes. “DCAT is a prime example of this,” he says. Sandro Hawke, who is on the W3C staff at MIT, Archer adds, is building a tool to make vocabulary management in w3.org/ns space easier. “All of this is alongside our continuing support for the development of schema.org,” he notes, which operates under the Web Schemas Task Force of the Semantic Web Interest Group that exists to offer advice on vocab management and development generally.
More In The Semantic Web Plus Column
The direction things are moving, he says, point out that the Semantic Web has been and is a success, and one that the W3C is very proud of. “Linked Data is firmly embedded in the psyche of librarians, researchers, government departments, life science researchers, financial services and more,” he says. “Of course I don’t claim that those sectors *only* use Sem Web, and naturally there are detractors, but the benefits of persistent URIs as identifiers for things, concepts, buildings, legislation, etc. are obvious to a growing number of people.”
There are even developers with no Semantic Web background at all who are climbing aboard what the Semantic Web community calls URIs but they think of as URLs for identifiers. He points anecdotally to a website created in Great Britain by someone of that ilk, whose basic idea was to deliver a URL as an identifier for every local authority in the country. “If people come from that angle and can then see the value of URIs as identifiers, that gets us a long way,” he says.
Not only that, but INSPIRE (Infrastructure for Spatial Information in the European Community) has a legally binding requirement on the 28 member states of the European Union to publish geospatial and environmental information according to European Commission-defined schemas. Its regulations recently were updated to allow URIs to serve as IDs and it is now “positively promoting URIs as identifiers for points, polygons, places,” and so on, says Archer. “The European Commission sees the value of URIs.”
In addition, the Research Data Alliance is looking at Linked Data and how it is going to use it, Archer notes, continuing that “libraries and researchers recognize the value of persistent URIs as identifiers and Linked Data as the technology to use them.” Be on the lookout, too, for GS1 to bring persistent URIs to barcodes for the retail sector for looking up product provenance.
Other nicely bouncing balls in the Semantic Web’s court include how gaps that have kept development worlds apart are being bridged. Case in point: The buy-in for JSON-LD. “If you think JSON-LD is great and easy for you to use and see its advantage over pure JSON, that’s great,” he says. “And if you don’t want to think about it being RDF that’s fine….People will use what they want to use anyway, and if they are using something that interoperates with what other people are doing, that’s good.”
Says Archer, “Again, W3C is not giving up on Sem Web. We’re working to integrate it into everything else.”