While much of the publishing industry still is getting up to speed on what semantic technology can do for business, it’s already deep within the DNA of The Tribune Company – to the point where Keith DeWeese, Director, Information and Semantics Management, can comfortably use the word “ontology” in discussions with non-tech employees, and enjoy the fact that they’re equally comfortable using it themselves.
DeWeese has been with the company since 2007, putting in place a sophisticated semantic system for auto-tagging and indexing content using natural language processing and controlled vocabularies, and leveraging its taxonomy for projects such as providing advanced search functionality. Thanks to building a collaborative communication channel with Tribune executives, producers, and editors, “now I actually am in meetings with executives who say how exciting it is that we now can be part of a community of people applying semantic technologies to content,” he says. “The other day I was at a meeting where a top executive used the word ontology all the time. I kept smiling and later I thanked her.”
Closely engaging with his business customers also is helping make it possible to push the semantic vision further at the company.
For example, the content management system provides an automated feedback mechanism for producers and editors to use that helps his team gauge where automated semantics processing is working well and where improvements to logic, algorithms, or other things need to be made. “You can’t be off on your own little peninsula” working on the technology without listening to your customers, says DeWeese, who will be discussing aspects of the cultural evolution that has been critical to The Tribune’s semantic journey as part of the case study he is presenting at the upcoming SemTech Biz 2012 in San Francisco.
In 2010 The Tribune had great success revamping its site search with a number of different semantic-tech pieces of functionality for relating topics, filtering, and refining results for the end user. Last year it also saw success using its semantic framework to drive even more relevancy, such as pushing out content to users based on what it understands about their interests; it gets such information, for example, for its LA Times property from a registration paywall where users can define their interests. Now it’s going through another search project to further refine and enhance the work, “to make sure we provide the results users want, and this caries through in terms of our toolset and vocabulary that we understand exactly what they are looking for,” he says.
That’s all a highly critical and operational part of the business now, and DeWeese is moving ahead to put more focus on how The Tribune might start employing semantic web standards such as RDF and OWL. It’s no small consideration for an organization with as much content as The Tribune, whose operations span print and broadcast properties across the country. “We’ve put a lot of energy into how do we add semantic technologies to our content so people can get our content easier,” he says. But with so much data, “we can’t [get up and running with] RDF overnight. It’s more of a strategic approach we have to take over time.”
Plus, like many other publishing businesses, The Tribune has had its tough times in the last few years, including having declared bankruptcy. So, despite being a major media presence, it’s not as if the resources are there to just jump in and make things happen right away. The approach has to be to have a strong design before build, and leverage third party resources if necessary to keep projects cost-effective and efficient, he says. The cloud, which The Tribune has been leveraging for a couple of years now, may help too. “What about for huge amounts of content, how many triples will you have — we have to step back and say how are we going to do this, but we are learning all the time.”
And there’s excitement about that learning, too. After all, The Tribune already has had the advantage of seeing benefits from the work it’s completed around auto-tagging content and surfacing relationships, and that gives encouragement to DeWeese and others at The Tribune. “The ability to do proof of concept, to see what kind of results you can get, we are there. We can start that, we have that environment,” he says. “Increasingly we are where we can talk about where we’ve had success vs. the more eye-opening moments. We’ve learned a lot about our content and controlled vocabularies, about rethinking NLP, and all the rules and logic definitions we use to support our automated processes. But in the end we have an environment now where we can move toward PoC either in-house or with a third party and not have it become as disruptive as it once would have been, or threatening because of a brittle infrastructure.”
DeWeese says he’s got to give credit to so many people at The Tribune, in its technology group but also executive management for really being on top of the business’ pain points and how semantic technology, including RDFa for embedding RDF data within web pages, can help address them. Together, they’ve taken on the important question: “How do we survive as a major news organization when the old business models for news don’t translate as we thought they would onto the Internet? And if there’s something else we can leverage, we try with every new product, every proof of concept,” he says. “The light becomes brighter all the time.”
If you’d like to hear DeWeese’s story live, and gain more insights into semantic technologies, you can register for SemTech Biz West here.
- A Closer Look At SemTechBiz Startup Competition Winner: KnowMED And Its Clinical Discovery Platform
- Navigating The World Of Open Data On The Web
- MOLTO: Improving Online Text Translation with Machine Learning
- The Potential of the News Storyline Ontology