Dynamic Semantic Publishing for Beginners, Part 2

Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening.  One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information.  Read Part 1.


News and Media Organizations were well represented at the Semantic Technology and Business Conference in San Francisco this year.  Among the organizations presenting were the New York Times, the Associated Press (AP), the British Broadcasting Co. (BBC), Hearst Media Co., Agence France Press (AFP), and Getty Images.

It was interesting to note that, outside of the New York Times, which has been publishing a very detailed index since 1912, many news organizations presenting at the conference did not make the extensive classification of content a priority until the last decade or so.  It makes sense that, in a newspaper publishing environment, creating a detailed and involved index that guides every reader directly to a specific subject mentioned in the paper must not have seemed as critical as it does now– it’s not as though the reader was likely to keep the newspaper for future reference material– so the work of indexing news content by subject as a reference was left for the most part for librarians to do well after an article was published.

In the early days of the internet, categorization of content (where it existed) was limited to simple taxonomies or to free tagging.  News organizations made rudimentary attempts to identify subjects covered by content, but  did not provide much information  about relationships between these subjects.   Search functions matched the words in the search to the words in the content of the article or feature.   Most websites still organize their content this way.

The drawbacks of this approach to online publishing is that it doesn’t make the most of the content “assets” publishers possess.    Digital content has the potential to be either permanent or ephemeral– it can exist and be accessed by a viewer for as long as the publisher chooses to keep it, and many news organizations are beginning to realize the value of giving their material a longer shelf life by presenting it in different contexts.   If you have just read an article about, say, Hillary Clinton, you would might be interested in a related story about the State Department, or perhaps her daughter Chelsea, or her husband Bill….   But how would any content management system be able to serve up a related story if no one had bothered to indicate somewhere what the story is about and how these people and/or concepts are related to one another?

Read more

Resolve Names in Freebase Data with :BaseKB

Ontology 2 logoOntology2 has announced the release of :BaseKB Early Access 2 (EA2), a tool for accessing Freebase data in RDF.

Paul Houle, founder of Ontology 2, says, “:BaseKB is an important milestone for both Freebase and the Semantic Web. :BaseKB opens Freebase to users of SPARQL and other RDF standards.  The superior quality of Freebase data solves data quality problems that have,  so far,  frustrated Linked Data applications.”

Read more

Expert Panel Finalized for #SemTechBiz San Francisco Program

Q: What do Google, Microsoft, Yahoo!, Yandex, the New York Times, and The Walt Disney Company have in common?


On June 2, 2011, was launched with little fanfare, but it quickly received a lot of attention. Now, almost exactly one year later, we have assembled a panel of experts from the organizations listed above to discuss what has happened since and what we have to look forward to as the vocabulary continues to grow and evolve, including up-to-the-minute news and announcements. The panel will take place at the upcoming Semantic Technology and Business Conference in San Francisco.

Moderated by Ivan Herman, the Semantic Web Activity Lead for the World Wide Web Consortium, the panel includes representatives from each of the core search engines involved in, and two of the largest early implementers: The New York Times and Disney. Among the topics we will discuss will be the value proposition of using markup, publishing techniques and syntaxes, vocabularies that have been mapped to, current tools and applications, existing implementations, and a look forward at what is planned and what is needed to encourage adoption and consumption.


photo of Ivan Herman Moderator: Ivan Herman
Semantic Web Activity Lead,
World Wide Web Consortium
Photo of Dan Brickley Dan Brickley
Contractor, at Google
Photo of John Giannandrea John Giannandrea
Director Engineering,
Photo of Peter Mika Peter Mika
Senior Researcher,
Photo of Alexander Shubin Alexander Shubin
Product Manager,
Head of Strategic Direction,
Photo of Mike Van Snellenberg Mike Van Snellenberg
Principal Program Manager,
Photo of Evan Sandhaus Evan Sandhaus
Semantic Technologist,
New York Times Company
Photo of Jeffrey Preston Jeffrey W. Preston
SEO Manager,
Disney Interactive Media Group

These panelists, along with the rest of the more than 120 speakers from SemTechBiz, will be on-hand to answer audience questions and discuss the latest work in Semantic Technologies. You can join the discussion by registering for SemTechBiz – San Francisco today (and save $200 off the onsite price)


Google Just Hi-jacked the Semantic Web Vocabulary

[Editor’s Note: This guest editorial is provided by Sean Golliher. He can be found on Twitter at @seangolliher]

The Semantic Web’s LOD Cloud

Google announced they’re rolling out new enhancements to their search technology and they’re calling it the “Knowledge Graph.”  For those involved in the Semantic Web Google’s “Knowledge Graph” is nothing new. After watching the video, and reading through the announcements, the Google engineers are giving the impression, to those familiar with this field, that they have created something new and innovative.

Google’s “new” Knowledge Graph

While it ‘s commendable that Google is improving search it’s interesting to note the direct translations of Google’s “new language” to the existing semantic web vocabulary. Normally engineers and researchers quote, or at least reference, the original sources of their ideas. One can’t help but notice that the semantic web isn’t mentioned in any of Google’s announcements. After watching the different reactions from the semantic web community I found that many took notice of the language Google used and how the ideas from the semantic web were repackaged as “new” and discovered by Google.

Read more

Google Knowledge Graph Interview

Vorhang aufGoogle’s Knowledge Graph has been the subject of lots of attention over the past few days since the announcement. And the focus of a lot of questions, too.

There’s been discussion on chat boards, for instance, about just who’s gotten access and who hasn’t. In a discussion with a representative from Google, The Semantic Web blog has learned that, like many other new Google services, the roll-out is gradual, in order to ensure the system is handling new functions well. First-come, first-served are those who are signed into Google – but then again, not everyone who is signed in. But the plan is to have everyone who’s signed in on board over the next few days, the rep says; so if you are and don’t have it yet, it should be hitting your browser shortly. Those not signed into Google accounts probably have a week or two of a wait left. So far, the rep said that things have been pretty smooth, so Google’s going at the pace it was hoping to.

Read more

Big TV Metadata

Red Bee Media, a company that “builds bridges between content and viewers” has posted a new article to their corporate blog regarding the growing volumes of television metadata. The article states, “TV Metadata is becoming increasingly rich and complex – powering increasingly advanced experiences. At a basic level, metadata tells us which programmes are available, and informs us about the content of those programmes. But metadata is getting richer and even bigger to support more visually engaging and functionally sophisticated user experiences.” Read more

Semantic Commerce: Structuring Your Retail Website for the Next Generation Web

Are you wondering why your product pages don’t stand out in search results like those from Amazon (shown below) or other competing e-commerce websites? These expanded results are commonly known as Rich Snippets (as named by Google) and are the result of having your HTML structured correctly with semantic markup. Whether you’re savvy to HTML5 and the latest design trends, or you haven’t updated your website code in years, this is article will explain why it’s important you structure your data properly utilizing semantic standards.

Sample of Rich Snippet result

There are a number of ways to structure your data to make it more relevant to search engines, as well as social media sites. As an e-commerce retailer it is important to understand which of these standards you should consider including in your website. You should take some time to ensure you are implementing semantic markup, and doing it correctly. It has the power to better inform potential customers with upfront knowledge prior to landing on your site. Customers can see product reviews, pricing and stock information, and even images before clicking through to your website. This can lead to increased click-through rates, improve conversions, and generally enhance your SEO objectives.

Read more

The Semantic Link with Guest, Denny Vrandecic – February, 2012

Paul Miller, Bernadette Hyland, Ivan Herman, Eric Hoffer, Andraz Tori, Peter Brown, Christine Connors, Eric Franzon

On Friday, February 10, a group of Semantic thought leaders from around the globe met with their host and colleague, Paul Miller, for the latest installment of the Semantic Link, a monthly podcast covering the world of Semantic Technologies. This episode includes a discussion about data; specifically, the recently announced “wikidata” project with special guest, Denny Vrandecic.
At the recent SemTechBiz Berlin conference, Denny presented a talk titled, “Wikidata: The Next Big Thing for Wikipedia.” As evidenced in the “Wow’s” expressed by the panelists in this month’s podcast call, this is indeed a big deal for Wikipedia and for Semantic Web. Read more

Introduction to: RDFa

Name Badge - Hello, My Name is RDFaSimply put, RDFa is another syntax for RDF. The interesting aspect of RDFa is that it is embedded in HTML. This means that you can state what things on your HTML page actually mean. For example, you can specify that a certain text is the title of a blog post or it’s the name of a product or it’s the price for a certain product. This is starting to be commonly known as “adding semantic markup”.

Historically, RDFa was specified only for XHTML. Currently, RDFa 1.1 is specified for XHTML and HTML5. Additionally, RDFa 1.1 works for any XML-based language such as SVG. Recently, RDFa Lite was introduced as “a small subset of RDFa consisting of a few attributes that may be applied to most simple to moderate structured data markup tasks.” It is important to note that RDFa is not the only way to add semantics to your webpages. Microdata and Microformats are other options, and I will discuss this later on. As a reminder, you can publish your data as Linked Data through RDFa. Inside your markup, you can link to other URIs or others can link to your HTML+RDFa webpages.

Why publish RDFa? Read more

BioBlitz 2011: A Little Semantics Goes A Long Way

This post was co-authored with Kevin Lynch.

Portrait photos of Christine Connors and Kevin Lynch, TriviumRLGIn October, BioBlitz 2011 took place in Tucson’s Saguaro National Park East and West. Thousands of volunteers worked together to discover the biodiversity of this marvelous place I call home. This blog entry outlines the work we’ve done the last few months, the reasons why BioBlitz matters (they might surprise you), and makes a call to photographers to help us test our crowdsourced image classification process.

The Team – National Geographic, National Park Service, Encyclopedia of Life, National Park Foundation

People from around the country worked hard to make BioBlitz successful. There were – and still are – a lot of moving parts. The Park is over 100 square miles, 70% of which is officially “wilderness,” which means, among other things, no wheels allowed! Read more