Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening.  One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information.  Read Part 1.


News and Media Organizations were well represented at the Semantic Technology and Business Conference in San Francisco this year.  Among the organizations presenting were the New York Times, the Associated Press (AP), the British Broadcasting Co. (BBC), Hearst Media Co., Agence France Press (AFP), and Getty Images.

It was interesting to note that, outside of the New York Times, which has been publishing a very detailed index since 1912, many news organizations presenting at the conference did not make the extensive classification of content a priority until the last decade or so.  It makes sense that, in a newspaper publishing environment, creating a detailed and involved index that guides every reader directly to a specific subject mentioned in the paper must not have seemed as critical as it does now– it’s not as though the reader was likely to keep the newspaper for future reference material– so the work of indexing news content by subject as a reference was left for the most part for librarians to do well after an article was published.

In the early days of the internet, categorization of content (where it existed) was limited to simple taxonomies or to free tagging.  News organizations made rudimentary attempts to identify subjects covered by content, but  did not provide much information  about relationships between these subjects.   Search functions matched the words in the search to the words in the content of the article or feature.   Most websites still organize their content this way.

The drawbacks of this approach to online publishing is that it doesn’t make the most of the content “assets” publishers possess.    Digital content has the potential to be either permanent or ephemeral– it can exist and be accessed by a viewer for as long as the publisher chooses to keep it, and many news organizations are beginning to realize the value of giving their material a longer shelf life by presenting it in different contexts.   If you have just read an article about, say, Hillary Clinton, you would might be interested in a related story about the State Department, or perhaps her daughter Chelsea, or her husband Bill….   But how would any content management system be able to serve up a related story if no one had bothered to indicate somewhere what the story is about and how these people and/or concepts are related to one another?

This is where the semantic approach to publishing has a great deal to offer, and the main benefits put forward by the news organizations presenting at the conference fall within two categories: Do More With Less, and Do More with More.  I’ll explain what I mean.

Do More With Less

The more content a news organization creates and attempts to relate to other content, the more unwieldy a a traditional tagging and/or taxonomy system will become.  Given an article about Kim Kardashian, how do you relate it to all of her former and current love interests and family members?    If you apply all of the names as tags it would soon grow very time consuming and impractical, but with a system that lets you dynamically create relationships between tagging terms, concepts, and people (using RDF and OWL), even long after an article is published, you can not only create new and ever more varied relationships between content items, you can also automate the publication of content according to pre-defined subject areas, enable search for very specific kinds of content, and re-use and even syndicate content through APIs (application program interfaces) in an ever-changing array of contexts (more on that later).   This is the  “dynamic” part of semantic publishing.   Because the relationship between the subjects contained within each content item has been spelled out in advance using an ontology, content can be automatically published and re-ordered in an almost infinite number of contexts (for example, you could set parameters for your content management system to automatically generate a page dedicated to a golf tournament, with current and past articles about every player participating in it.)

The Associated Press has created AP Metadata Services to help publications accomplish this kind of operational efficiency.  AP’s service offers text analysis to identify concepts, people, and organizations within a text, and a taxonomy to relate these “entities” to one another in a structured framework.  Here AP’s Vice President of Information Management Amy Sweigert describes what AP is gaining from their new service (just released this year):

Do More With More

What’s perhaps the most exciting thing semantic publishing has to offer is that it enables the immediate sharing and assimilation of data, content, and information across several different platforms.   By organizing your content and defining relationships between the people, places, and subjects mentioned therein, you can now quickly and efficiently start to pull in similarly structured data and information from the vast linked open data (LOD) universe in ways that directly relate to, and augment, your content.   You can use LOD to add compelling content from external sources, or to provide parameters for extraordinarily detailed queries (searches using SPARQL).   As part of the open linked data universe, you can also easily syndicate your own content through a linked data Application Program Interface (API) or even just a SPARQL endpoint.  Here is how Evan Sandhaus of the New York Times describes the benefits of the New York Times’ approach to semantically structuring its content:

New semantic technologies such as the Information Workbench (mentioned in part 1 of this series as a tool that the BBC is currently using) can augment and update content by bringing in data and information from both structured and unstructured sources (such as live performance data, as in a sports competition).  It can also enable a news desk to quickly analyze, visualize (or graphically represent), and publish of all kinds of data through graphs, charts and maps- sure to be a hit with fans of data journalism.   Here is a great demo of how news organizations like the BBC can use the Information Workbench.

If Content is King, Context is … Queen.

By making use of semantic web resources such as Facebook’s Open Graph Protocol a publication is are able to determine who you, the viewer, are, to whom you are connected and what subjects or groups you like if you use social media.   Applying it to semantically annotated content means that this news organization is able to match and distribute content to subscribers according to their stated interests, or in a way that is related- directly or indirectly- to their social connections.   Agence France Press is using a product called Profium Sense to do just that.   It’s not hard to imagine that in the near future more and more news and media sites will automatically apply a set of distinct interface overlays to serve up content according to the viewing context- not just according to the device someone is using to view the content, but also serve up content directly related to the preferred subjects of the viewer, or content related to his or her social connections – without the viewer necessarily setting or indicating those preferences directly on the site.  In other words, you could have the same content source serve you a different interface- with different suggested articles/media- at work, at home, and/or as part of an interest group.

That about sums up this weeks’ installment.   Next week, for the third and final part of this series, I will be looking into some open source tools to help smaller publishers enter the world of Dynamic Semantic Publishing.


Kristen Milhollin is a writer, mother, champion of good causes, and semantic web enthusiast.  She is also the Project Lead for GoodSpeaks.org.