Photo Courtesy: Flickr/

eBooks are cool, but they could get even cooler with EPUB3, the next version of the widely adopted distribution and interchange format for digital books (well, except for Amazon). The latest version of the standard could make it easier for publishers to more flexibly represent their offerings to digital book retailers, and add a lot of excitement to the eBook reading experience, too.

EPUB3 is based on HTML 5 and was proposed to include RDFa. RDFa is in question for eBook metadata now, however, though there is still the possibility to embed RDF/OWL within eBook content. (Membership comments on EPUB3 are due in by Aug. 22). EPUB 3 requires the same three metadata elements as EPUB 2, which are dc:identifier, dc:title, and dc:language, while also permitting many more. “We left it open to using something like RDFa so you can put in what you need to,” says Eric Freese, solutions architect at digital publishing solutions vendor Aptara. That could include, for example, using the PRISM (Publishing Requirements for Industry Standard Metadata) XML metadata vocabulary for managing and aggregating publishing content, or ONIX metadata for representing and communicating book industry product information.

However the RDFa question fares, one thing that is increasingly clear to publishers that have done any looking at all into eBooks, Freese says, is that “it doesn’t take long before they get hit in the face with the metadata problem. And as more time goes by there are fewer and fewer publishers who haven’t thought about doing eBooks.”

Freese, who also is a member of the International Digital Publishing Forum, the digital publishing industry’s trade and standards organization from which EPUB hails, will be speaking about the semantic web and the publishing industry at the upcoming Semantic Web Media Summit.


Freese explains that the EPUB3 spec essentially is mute on using semantic technologies within the content but does not disallow their use either, and he’s hopeful of the industry taking advantage of that.

So EPUB3 can be used to address many of the issues around representing titles, enabling works to be understood by epublishing retailers by multiple official titles (Adventures of Tom Sawyer and Adventures of Tom Sawyer, The, for instance); across languages; or by their place in a series of volumes. That’s good for business, but here are some ways that Freese says EPUB3 and semantic technologies could make magic for Joe and Jill Reader, too:

● SKOS (Simple Knowledge Organization System) provides a model for representing knowledge organization systems on the Semantic Web, and  sharing and linking them. Freese says EPUB3 doesn’t specifically mention SKOS but it could be used, for example, for porting organized collections. That way, publishers can, for instance, include in a biology book classification schemes or taxonomies on the hierarchy of species.

● Because there is the ability to do scripting thru Javascript, smarter searches might be possible, too. For instance, readers of that bio book who might want to know about giraffes and squirrels could do a broad search on mammals and both those species would show up in the hits.

● With navigational capabilities in the new standard, that same bio book could be presented to read in order of the hierarchy of species, vs. serially by chapters.

● For connected readers, like the iPad, publishers could theoretically leverage links to Linked Open Data sources to hook into based on the book’s subject. For instance, a history textbook might cover U.S. presidents up through Barack Obama, and then rely on Linked Open Data to keep that content accurate and up to date as new presidents take office. There might be some issues of provenance of information to consider here, he notes, but publishers can accommodate that by indicating which sources they trust or otherwise letting their sites dictate what other information the works can link to.

● Textbook publishers might be the initial ones to realize the most benefit from putting semantic information into content. But he could see some opportunities for trade publishers to get on board too. Imagine, for instance, enhancing the reading experience of a book like Dan Brown’s The Lost Symbol, set in Washington D.C. “It could be cool if you could pop a map to see where the characters went to in DC, or get information about the Capitol Building just by inserting a couple links,” he muses.

Today, Freese says, some innovations are already taking place without EPUB3 – but those innovations could be facilitated with it and its HTML5 groundings. He mentions, for example, Theodore Gray’s coffee table book, The Elements, a photographic collection of all the elements in the periodic table which also was one of the first iPad applications. “The semantic part is that it’s hooked into WolframAlpha, so on the page about gold you can click on the WolframAlpha link and get the price of gold currently, chemical information about it, straight out of WolframAlpha. So it has a semanticish capability built right into that book that basically gives that capability to provide the latest, greatest information within the presentation of the printed material.”

A custom native app had to be written for the iPad to enable this. But that wouldn’t have to be the case for individual ebook platforms if EPUB3 were used.  “An eBook really is a web site in a box. It’s just a bunch of HTML pages and images with EPUB wrappers for navigation and content around it, and with Javascript you can do a lot of what you do in apps now within an epub book,” Freese says. “And the beauty of it is that it works across EPUB3 readers, assuming digital rights management is in place to let them play between readers.”

And hey, if you just want to read your eBook in peace and quiet, you should be able to do that too – assuming reading systems give users the capability to turn off the bells and whistles.