Adrienne Lafrance of The Atlantic reports, “One of the tasks the human brain best performs is identifying patterns. We’re so hardwired this way, researchers have found, that we sometimes invent repetitions and groupings that aren’t there as a way to feel in control. Pattern recognition is, of course, a skill computers have, too. And machines can group data at scales and with speeds unlike anything a human brain might attempt. It’s what makes computers so powerful and so useful. And seeing the structural framework for patterns across vast systems of categorization can be enormously revealing, too.” Read more
Posts Tagged ‘Evan Sandhaus’
In the winter of 2012, The New York Times began its implementation of the schema.org compatible version of rNews, a standard for embedding machine-readable publishing metadata into HTML documents, to improve the quality and appearance of its search results, as well as generate more traffic through algorithmically generated links. The semantic markup for news articles brought to its web pages structured data properties to define author, the date a work was created, its editor, headline, and so on.
But according to a leaked New York Times internal innovation report that appears here, there’s more work to be done in the structured data realm as part of a grand plan to truly put digital first in the face of falling website and smartphone app readership and hotter competition from both old guard and new age newsrooms and social media properties that are transforming how journalism is delivered for an audience increasingly invested in mobile, social, and personalized technologies.
The report was put together with insights from parties including Evan Sandhaus, director for search, archives and semantics at The NY Times, who was instrumental in the rNews/schema.org effort as well as the TimesMachine relaunch, a digital archive of 46,592 issues of The New York Times whose use includes surrounding current news stories with context. While the report notes that the Gray Lady has not been standing still in the face of its challenges, citing newsroom advances to grow audience with efforts such as using data to inform decisions, it needs to do more – faster – to make it easy to get its content in front of digital readers.
Last week, the 11th International Semantic Web Conference (ISWC 2012) took place in Boston. It was an exciting week to learn about the advances of the Semantic Web and current applications.
The first two days, Sunday November 11 and Monday November 12, consisted of 18 workshops and 8 tutorials. The following three days (Tuesday November 13 – Thursday November 15) consisted of keynotes, presentation of academic and in-use papers, the Big Graph Data Panel and industry presentations. It is basically impossible to attend all the interesting presentations. Therefore, I am going to try my best to summarize and offer links to everything that I can.
What’s the latest news about rNews ? Attendees at the SemTech event in NYC Tuesday had a chance to find out.
“The future of rNews 1.0 is rNews .1.1,” said Stuart Myles, deputy director of schema standards at the Associated Press who also heads up the International Press Telecommunications’ Council’s Semantic Web work. At next week’s IPTC meeting a vote will be taken on V. 1.1, with its adoption the hopeful outcome.
Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening. One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information. Read Part 1.
News and Media Organizations were well represented at the Semantic Technology and Business Conference in San Francisco this year. Among the organizations presenting were the New York Times, the Associated Press (AP), the British Broadcasting Co. (BBC), Hearst Media Co., Agence France Press (AFP), and Getty Images.
It was interesting to note that, outside of the New York Times, which has been publishing a very detailed index since 1912, many news organizations presenting at the conference did not make the extensive classification of content a priority until the last decade or so. It makes sense that, in a newspaper publishing environment, creating a detailed and involved index that guides every reader directly to a specific subject mentioned in the paper must not have seemed as critical as it does now– it’s not as though the reader was likely to keep the newspaper for future reference material– so the work of indexing news content by subject as a reference was left for the most part for librarians to do well after an article was published.
In the early days of the internet, categorization of content (where it existed) was limited to simple taxonomies or to free tagging. News organizations made rudimentary attempts to identify subjects covered by content, but did not provide much information about relationships between these subjects. Search functions matched the words in the search to the words in the content of the article or feature. Most websites still organize their content this way.
The drawbacks of this approach to online publishing is that it doesn’t make the most of the content “assets” publishers possess. Digital content has the potential to be either permanent or ephemeral– it can exist and be accessed by a viewer for as long as the publisher chooses to keep it, and many news organizations are beginning to realize the value of giving their material a longer shelf life by presenting it in different contexts. If you have just read an article about, say, Hillary Clinton, you would might be interested in a related story about the State Department, or perhaps her daughter Chelsea, or her husband Bill…. But how would any content management system be able to serve up a related story if no one had bothered to indicate somewhere what the story is about and how these people and/or concepts are related to one another?
A packed room at the Semantic Tech & Business Conference in San Francisco played host to the much-anticipated Schema.org panel on Wednesday morning. As W3C semantic activity lead and moderator Ivan Herman had hoped (see this article), the discussion didn’t get bogged down in a duel between RDFa and microdata, but rather emphasized some important accomplishments of the last year and looked forward to future work.
As Herman put it, the only discussion he wanted to have around RDFa was to announce that the proposed RDFa 1.1 recommendations are expected to be published as official W3C standards Thursday, and that there had been a lot of interaction with the schema.org folks to make this useable for them as well.
Wednesday’s panel was composed of: Dan Brickley, of Schema.org at Google; R.V. Guha of Google; Steve Macbeth of Microsoft; Peter Mika ofYahoo!; Jeffrey W. Preston of Disney Interactive Media Group; Evan Sandhaus of The New York Times Company; and Alexander Shubin of Yandex.
Here are highlights of what took place:
Q: What do Google, Microsoft, Yahoo!, Yandex, the New York Times, and The Walt Disney Company have in common?
On June 2, 2011, schema.org was launched with little fanfare, but it quickly received a lot of attention. Now, almost exactly one year later, we have assembled a panel of experts from the organizations listed above to discuss what has happened since and what we have to look forward to as the vocabulary continues to grow and evolve, including up-to-the-minute news and announcements. The panel will take place at the upcoming Semantic Technology and Business Conference in San Francisco.
Moderated by Ivan Herman, the Semantic Web Activity Lead for the World Wide Web Consortium, the panel includes representatives from each of the core search engines involved in schema.org, and two of the largest early implementers: The New York Times and Disney. Among the topics we will discuss will be the value proposition of using schema.org markup, publishing techniques and syntaxes, vocabularies that have been mapped to schema.org, current tools and applications, existing implementations, and a look forward at what is planned and what is needed to encourage adoption and consumption.
|Moderator: Ivan Herman
Semantic Web Activity Lead,
World Wide Web Consortium
schema.org at Google
Head of Strategic Direction,
|Mike Van Snellenberg
Principal Program Manager,
New York Times Company
|Jeffrey W. Preston
Disney Interactive Media Group
These panelists, along with the rest of the more than 120 speakers from SemTechBiz, will be on-hand to answer audience questions and discuss the latest work in Semantic Technologies. You can join the discussion by registering for SemTechBiz – San Francisco today (and save $200 off the onsite price)
Evan Sandhaus reports for the New York Times that rNews has finally arrived. He explains, “On January 23rd, 2012, The Times made a subtle change to articles published on nytimes.com. We rolled out phase one of our implementation of rNews – a new standard for embedding machine-readable publishing metadata into HTML documents. Many of our users will never see the change but the change will likely impact how they experience the news. Far beneath the surface of nytimes.com lurk the databases — databases of articles, metadata and images, databases that took tremendous effort to develop, databases that the world only glimpses through the dark lens of HTML.” Read more
[UPDATE - November 9, 2011: the IPTC rNews version 1.0 documentation is now available.]
Today (Oct. 7, 2011), at a gathering of the International Press Telecommunications Council (IPTC), rNews took the step from being a proposal to being a formal standard. rNews was created by the IPTC and made its public debut earlier this year as a proposal for using RDFa to annotate news-specific metadata in HTML documents.
Congratulations to the IPTC and the leaders of the rNews standardization effort: Andreas Gebhard (Getty Images), Evan Sandhaus (New York Times), and Stuart Myles (Associated Press).
When it comes to schema.org, there’s some good news – and some ‘eh’ news.
Let’s start with the positive stuff. Today at the schema blog, the news was released that schema.org has added to its NewsArticle and related types such as CreativeWork new properties for mark-up based on the rNews standard from the International Press Telecommunications Council (IPTC).
NEXT PAGE >>