Kristen Milhollin

Kristen is a writer, mother, champion of good causes, and semantic web enthusiast. She is also the Project Lead for GoodSpeaks.org.

Semantic Video’s Banner Year

The BBC made use of semantic video annotation in its coverage of the 2012 Olympics

It’s fair to say that an good idea has finally “arrived” when it has left the realm of the theoretical and has become the foundation of a lot of popular tools, services, and applications.

That is surely the case with Semantic Video.

Gone are the days when internet video could best be described as a meaningless blob of content invisible to search and impossible to annotate and reuse in meaningful ways.

The past year has seen an explosion of practical (and popular) services and applications that are based upon the extraction of meaningful metadata– and often linked data– from video content.

For those of us lucky enough to view it, the BBC wowed us last July with its Olympic Coverage, broadcasting live every event of the Olympics on 24 HD streams, all accessible over the internet, with live, dynamic data and statistics on athletes.  To pull off this feat, the BBC used a custom-designed Dynamic Semantic Publishing platform which included fluid Operations’ Information Workbench to help author, curate and publish  ontology and instance data.

Read more

Linked Open Government Data: Dispatch from the Second International Open Government Data Conference

“What we have found with this project is… the capacity to take value out of open data is very limited.”

With the abatement of the media buzz surrounding open data since the first International Open Government Data Conference (IOGDC) was held in November 2011, it would be easy to believe that the task of opening up government data for public consumption is a fait accompli.  Most of the discussion at this year’s IOGDC conference, held July 10-12, centered on the advantages and roadblocks to creating an open data ecosystem within government, and the need to establish the right mix of policies to promote a culture of openness and sharing both within and between government agencies and externally with journalists, civil society, and the public at large.   According to these metrics the open government data movement has much to celebrate:  1,022,787 datasets from 192 catalogs in 24 languages representing 43 countries and international organizations.

The looming questions about the utility of open government data make it clear, however, that the movement is still in its early stages.    Much remains to be done to to provide usable, reliable, machine-readable and valuable government data to the public.

Read more

Dynamic Semantic Publishing for Beginners, Part 3

Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening.  One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information.  Read Part 2.

—-

So far we’ve looked at the “cutting edge” of dynamic semantic publishing (BBC Olympics) and we’ve seen what tools large publishers such as the New York Times, Associated Press, and Agence France Press are using to semantically annotate their content.

And we’ve learned how semantic systems help publishers “Do More With Less”- that is, automate a lot of the work organizing content and identifying key concepts, entities, and subjects- and “Do More With More” – combine their content with related linked open data and present it in different contexts.

You may still be asking at this point, “What makes this so novel and cool?  We know that semantic tools save time and resources.  And some people say semantic publishing is about search optimization, especially after the arrival of Google’s Knowledge Graph.  But the implications of semantic publishing are about oh so much more than search.    What semantic systems are really designed for, to use the phrase attributed to Don Turnbull, is “information discovery” and, if semantic standards and tools are widely adopted in the publishing world, this could have huge implications for content and data syndication.

Read more

A Simple Tool in a Complex World: An Interview with Zemanta CTO Andraz Tori

 

Andraz Tori is the Owner and Chief Technology Officer at Zemanta, a tool that uses natural language processing (NLP) to extract entities within the text of a blog and enrich it with related media and articles from Zemanta’s broad user base.    This interview was conducted for Part 3 of the series “Dynamic Semantic Publishing for Beginners.”

Q. Although the term “Dynamic Semantic Publishing” appears to have come out of the BBC’s coverage of the 2010 World Cup, it looks as though Zemanta has been applying many of the same principles on behalf of smaller publishers since 2008.  Would you characterize it this way, or do you think that Zemanta is a more limited service with specific and targeted uses, while the platform built by BBC is its own semantic ecosystem?  How broadly should we define Dynamic Semantic Publishing?

A. What Zemanta does is empower the writer through semantic technologies. It’s like having an exoskeleton that gives you superpowers as an author. But Zemanta does not affect the post after it was written.   On the other hand dynamic semantic publishing is based on the premise of bringing together web pages piece-meal from a semantic database, usually in real time.

Read more

Dynamic Semantic Publishing for Beginners, Part 2

Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening.  One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information.  Read Part 1.

—-

News and Media Organizations were well represented at the Semantic Technology and Business Conference in San Francisco this year.  Among the organizations presenting were the New York Times, the Associated Press (AP), the British Broadcasting Co. (BBC), Hearst Media Co., Agence France Press (AFP), and Getty Images.

It was interesting to note that, outside of the New York Times, which has been publishing a very detailed index since 1912, many news organizations presenting at the conference did not make the extensive classification of content a priority until the last decade or so.  It makes sense that, in a newspaper publishing environment, creating a detailed and involved index that guides every reader directly to a specific subject mentioned in the paper must not have seemed as critical as it does now– it’s not as though the reader was likely to keep the newspaper for future reference material– so the work of indexing news content by subject as a reference was left for the most part for librarians to do well after an article was published.

In the early days of the internet, categorization of content (where it existed) was limited to simple taxonomies or to free tagging.  News organizations made rudimentary attempts to identify subjects covered by content, but  did not provide much information  about relationships between these subjects.   Search functions matched the words in the search to the words in the content of the article or feature.   Most websites still organize their content this way.

The drawbacks of this approach to online publishing is that it doesn’t make the most of the content “assets” publishers possess.    Digital content has the potential to be either permanent or ephemeral– it can exist and be accessed by a viewer for as long as the publisher chooses to keep it, and many news organizations are beginning to realize the value of giving their material a longer shelf life by presenting it in different contexts.   If you have just read an article about, say, Hillary Clinton, you would might be interested in a related story about the State Department, or perhaps her daughter Chelsea, or her husband Bill….   But how would any content management system be able to serve up a related story if no one had bothered to indicate somewhere what the story is about and how these people and/or concepts are related to one another?

Read more

Dynamic Semantic Publishing for Beginners, Part 1

 

Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening.  One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information.

————–

The 2010 World Cup was a notable first not only for Spain, but also for publishing and the BBC.  This is because the BBC’s coverage of the tournament marked a dramatic evolution in the way content can be delivered online.  This new system was labeled dynamic semantic publishing (DSP) by the team of architects–including Jem Reyfield and Paul Wilton–that created it.  DSP was soon defined as “utilizing Linked Data technology to automate the aggregation and publication of interrelated content objects.”

Read more

Under the Hood: A Closer Look at Information WorkBench

fluidOps Logofluid Operations’ Information Workbench is part of the semantic infrastructure supporting the BBC’s revolutionary coverage of the 2012 Olympic Games.  Below is a conversation with fluid Operations Senior Architect for Research & Development Michael Schmidt in advance of his 2012 Semantic Technology and Business Conference presentation. This conversation is a supplement to the series “Dynamic Semantic Publishing for Beginners.

Q. Is the Information Workbench a response to the need for more robust applications to help process “Big Data”? How is it different than other popular tools?

A. Dealing with Big Data involves a number of different challenges, including increasing volume (amount of data), complexity (of schemas and structures), and variety (range of data types, sources).

However, most Big Data solutions available on the market today focus on volume only, in particular supporting vertical scalability (greater operating capacity, efficiency, and speed.)   This means that such solutions mainly address the analysis of large volumes of similarly structured data sets. Yet the Big Data problem is not fully solved only by technologies that help you process similarly structured data more quickly and efficiently.

Read more

Dynamic Semantic Publishing for News Organizations

Ontoba logo

Paul Wilton was Technical and development lead for semantic publishing at BBC News and Sport Online during the 2010 World Cup.  Currently he is the Technical architect at Ontoba.  In this interview, a supplement to “Dynamic Semantic Publishing for Beginners”, Paul describes the current landscape for DSP as it applies to news organizations.

Q. Are you seeing a wide disparity in the way that news organizations have approached the creation and use of semantically-linked (or annotated) content?

A. Actually the pattern and often the (general) technical architecture is surprisingly similar. Where things differ are the applications, models used and instance data. This is undoubtedly bleeding edge technology, and typically the impetus to begin investigating the use of linked data, RDF and semantics in the technology stack has come from within the Information Architecture and R&D teams, not from the offices of the CTO/CIO. Maybe this is starting to change now.

Q. Do many news organizations have the resources (staff and/or Content Management Systems) that are able to publish and use semantic data?

A. Not in our experience, but this shouldn’t be a barrier to integrating semantic technologies and publishing linked data.

The key components to adopting semantic publishing – a semantic repository (triple store); appropriate linked data sets; and the ability to semantically annotate your content – can be built alongside an existing Content Management System. Read more