Matthew Ingram of GigaOM recently wrote, “You might not think an applied mathematician who does research in biology and has a PhD in theoretical physics would have much to offer a 163-year-old newspaper publisher, but Chris Wiggins, head of the data science team at the New York Times, told attendees at the Structure conference in San Francisco that machine learning can do much the same thing for media companies as it does for research biologists: namely, make sense of a whole pile of data.” Read more
Posts Tagged ‘New York Times’
In the winter of 2012, The New York Times began its implementation of the schema.org compatible version of rNews, a standard for embedding machine-readable publishing metadata into HTML documents, to improve the quality and appearance of its search results, as well as generate more traffic through algorithmically generated links. The semantic markup for news articles brought to its web pages structured data properties to define author, the date a work was created, its editor, headline, and so on.
But according to a leaked New York Times internal innovation report that appears here, there’s more work to be done in the structured data realm as part of a grand plan to truly put digital first in the face of falling website and smartphone app readership and hotter competition from both old guard and new age newsrooms and social media properties that are transforming how journalism is delivered for an audience increasingly invested in mobile, social, and personalized technologies.
The report was put together with insights from parties including Evan Sandhaus, director for search, archives and semantics at The NY Times, who was instrumental in the rNews/schema.org effort as well as the TimesMachine relaunch, a digital archive of 46,592 issues of The New York Times whose use includes surrounding current news stories with context. While the report notes that the Gray Lady has not been standing still in the face of its challenges, citing newsroom advances to grow audience with efforts such as using data to inform decisions, it needs to do more – faster – to make it easy to get its content in front of digital readers.
Now you can get a master’s degree in Computer Science from a prestigious university online. The New York Times has reported that the Georgia Institute of Technology is planning to offer the CS degree via the MOOC (massive open online course) model.
According to the Georgia Tech MS Computer Science program of study website, students can choose specializations in topics such as computational perception and robotics, which includes courses in artificial intelligence, machine learning, and autonomous multi-robot systems among student choices; interactive intelligence, which include courses in knowledge-based AI and natural language; or machine learning, which offers electives in the topic for theory, trading and finance, among other options.
Greg Bates of Programmable Web reports, “The Gray Lady is getting her code on. In Andre Behrens’s New York Times blog, Open, billed as ‘All the code that’s fit to print,’ he recounts events on coding and science held in 2012. Two of the notable events were the well-attended one on Big Data and Smarter Scaling and their Open Source Science Fair. Three speakers graced the Big Data event: Andrew Montalenti the CTO of Parse.ly… James Boehmer, Manager of Search Technology at the New York Times; and Allan Beaufour, CTO of Chartbeat.” Read more
Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening. One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information. Read Part 1.
News and Media Organizations were well represented at the Semantic Technology and Business Conference in San Francisco this year. Among the organizations presenting were the New York Times, the Associated Press (AP), the British Broadcasting Co. (BBC), Hearst Media Co., Agence France Press (AFP), and Getty Images.
It was interesting to note that, outside of the New York Times, which has been publishing a very detailed index since 1912, many news organizations presenting at the conference did not make the extensive classification of content a priority until the last decade or so. It makes sense that, in a newspaper publishing environment, creating a detailed and involved index that guides every reader directly to a specific subject mentioned in the paper must not have seemed as critical as it does now– it’s not as though the reader was likely to keep the newspaper for future reference material– so the work of indexing news content by subject as a reference was left for the most part for librarians to do well after an article was published.
In the early days of the internet, categorization of content (where it existed) was limited to simple taxonomies or to free tagging. News organizations made rudimentary attempts to identify subjects covered by content, but did not provide much information about relationships between these subjects. Search functions matched the words in the search to the words in the content of the article or feature. Most websites still organize their content this way.
The drawbacks of this approach to online publishing is that it doesn’t make the most of the content “assets” publishers possess. Digital content has the potential to be either permanent or ephemeral– it can exist and be accessed by a viewer for as long as the publisher chooses to keep it, and many news organizations are beginning to realize the value of giving their material a longer shelf life by presenting it in different contexts. If you have just read an article about, say, Hillary Clinton, you would might be interested in a related story about the State Department, or perhaps her daughter Chelsea, or her husband Bill…. But how would any content management system be able to serve up a related story if no one had bothered to indicate somewhere what the story is about and how these people and/or concepts are related to one another?
Q: What do Google, Microsoft, Yahoo!, Yandex, the New York Times, and The Walt Disney Company have in common?
On June 2, 2011, schema.org was launched with little fanfare, but it quickly received a lot of attention. Now, almost exactly one year later, we have assembled a panel of experts from the organizations listed above to discuss what has happened since and what we have to look forward to as the vocabulary continues to grow and evolve, including up-to-the-minute news and announcements. The panel will take place at the upcoming Semantic Technology and Business Conference in San Francisco.
Moderated by Ivan Herman, the Semantic Web Activity Lead for the World Wide Web Consortium, the panel includes representatives from each of the core search engines involved in schema.org, and two of the largest early implementers: The New York Times and Disney. Among the topics we will discuss will be the value proposition of using schema.org markup, publishing techniques and syntaxes, vocabularies that have been mapped to schema.org, current tools and applications, existing implementations, and a look forward at what is planned and what is needed to encourage adoption and consumption.
|Moderator: Ivan Herman
Semantic Web Activity Lead,
World Wide Web Consortium
schema.org at Google
Head of Strategic Direction,
|Mike Van Snellenberg
Principal Program Manager,
New York Times Company
|Jeffrey W. Preston
Disney Interactive Media Group
These panelists, along with the rest of the more than 120 speakers from SemTechBiz, will be on-hand to answer audience questions and discuss the latest work in Semantic Technologies. You can join the discussion by registering for SemTechBiz – San Francisco today (and save $200 off the onsite price)
The Pew Research Center’s Project for Excellence in Journalism State of the News Media 2012 report was just published, and among the findings is that efforts by most top news sites to monetize the web in their own right are still limited. Few news companies, it reports, “have made much progress in some key new digital areas. Among the top news websites, there is little use of the digital advertising that is expected to grow most rapidly, so-called “smart,” or targeted, advertising.”
Failing to make a lot more hay from digital ads is problematic for traditional news companies given the decline in print circulation and in its ad revenue, too. The report says that in 2011, losses in print advertising dollars outpaced gains in digital revenue by a factor of roughly 10 to 1, which it calls an even worse ratio than in 2010.
Get ready for some new apps for Elsevier’s Sciverse framework. Last year Elsevier, which has one of the largest vaults of scientific data in the world, launched its Sciverse Applications module. This provided a way for researchers and scientists to develop and share customized solutions that improve search and discovery of its wealth of integrated content and meta-data in the SciVerse hub of ScienceDirect, SciVerse Scopus, Sciverse SciTopics, and targeted web content.
Now it’s announced the winners of its Apps For Science competition, social and semantic ones that plug into the framework among them (see above). Elsevier recognizes that when it comes to meeting researchers’ search and discovery needs, it can’t do it all alone. “We’re not going to come up with all the solutions ourselves, so a key goal is to collaborate with developers and researchers to provide tools,” says Rafael Sidi, Vice President Product Management, Applications Marketplace and Developer Network, Elsevier
A recent article reports, “Many newspapers and other traditional media entities still think of themselves as delivering their content in a specific package… But few are thinking about their businesses in radically different ways — as content-generating engines with multiple delivery methods, or as platforms for data, around which other things can be built. USA Today appears to be moving in this direction, by opening up its data for others to use and even commercialize, following in the footsteps of The Guardian and its ground-breaking open platform.” Read more
NEXT PAGE >>