Ivan Herman Discusses Lead Role At W3C Digital Publishing Activity — And Where The Semantic Web Can Fit In Its Work
There’s a (fairly) new World Wide Web Consortium (W3C) activity, the Digital Publishing Activity, and it’s headed up by Ivan Herman, formerly the Semantic Web Activity Lead there. That activity was subsumed in December by the W3c Data Activity, with Phil Archer taking the role as Lead (see our story here).
Begun last summer, the Digital Publishing Activity has, as Herman describes it, “millions of aspects, some that have nothing to do with the semantic web.” But some, happily, that do – and that are extremely important to the publishing community, as well.
Take annotation, and the concept of having two digital resources that you want to combine in a specific and structured way, like a web page and a sticky note. Herman notes that the Open Annotation Group at the W3C has done successful work on developing a model for annotating digital resources that is strongly related to RDF, and that “on the publishers’ side, having annotations is actually extremely important,” as well. E-readers today typically have some sort of annotation facility, some good and some bad, he says, but all proprietary. “If you have an e-pub file, even if you could put it into another reader, you can’t carry the annotation with you, and that’s not what you want,” he says.
Consider in particular the education market, which Herman says is one of the most dynamic sectors in the publishing world. Students and educators have a tradition of annotating their paper textbooks, and it’s one the educational community wants to continue for electronic teaching materials, too. It’s very crucial to explore the future of digital annotation for this whole area, he explains, hopefully with the publishing industry acting as an active participant in doing so. “It’s relevant for the semantic web [community] because of the underlying RDF structure,” he says, and notes that this year we can potentially look forward to some work getting underway in the Activity to move the needle forward here.
Another area of intersection with the semantic web: “Metadata for publishers for e-books is absolutely essential,” says Herman. “It’s really a core business aspect for them,” in particular in regard to serving as the means for which readers can find information about e-books on the web. A lot of effort has gone into metadata vocabularies to date, of course, with EDItEUR’s ONIX serving as one major example used by publishers. But, in the electronic ecosystem, Herman points out, they’ll come across many more, and confront questions over relating to libraries and their own metadata world (BIBFRAME and Dublin Core Metadata Initiative, for instance). “And you can’t forget schema.org, which also has bibliographic information,” he says.
It’s a complicated area, but at the least, Herman says, he’s hopeful the Digital Publishing Activity can assist as a neutral source in properly documenting things, “and give some kind of guidelines, categorizations, mutual relationships of various vocabularies that would be extremely helpful,” he says.
Herman also points out that, already on the semantic front, there’s interest by EDItEUR, for instance, in evolving the ONIX XML vocabulary into something around RDF, and that generally speaking those with a stake in the ground in this area “realize this metadata should be bound to other vocabularies via the Linked Open Data cloud. Put another way, these metadata vocabularies for books should be part of LOD somehow,” he says. “We are in a position to maybe help that.”
When it comes to scholarly publishing, Herman also discusses that there’s an increasing recognition that linear stories no longer really reflect the work that goes into sciences-related publishing projects. “You want to attach data, video and experiments of data, so you want essentially to publish data sets and then you suddenly hit a number of issues about data publishing,” he says. In an abstract sense, he says, you could consider these research objects together as a small graph, with RDF behind the scenes, that collects everything that makes a research output, from the traditional article to the data sets to the multimedia and experimental results, and their relationships to each other.
“The scholarly publishing world for now is not yet prepared for that but long-term that has to change,” he says. It’s not a core W3C effort yet, “but eventually there may be another intersecting area between publishing in the general and traditional sense and data on the web in a very general sense.”
There’s a long road ahead, he acknowledges, but it’s an exciting time, Herman says. Stay tuned.