Ron Powell recently interviewed Jonathan Litchman, an SVP at SAIC, regarding the role of linguistic technologies in Big Data. During the interview Litchman commented, “Big data, as you know, is a term used for lots of different things. When I think about big data, it depends on how big you want to get. If you think about the vast amounts of data that people need to be able to handle in only one language, you have tremendous big data issues; but if you understand that the most effective use of big data is to be more inclusive and make that big data more global, then you have a situation in which your data increases exponentially with the inclusion of multiple languages within that dataset.”
Litchman continued, “Our Omnifluent products help people who want to do analytics and mining on big data be able to do so without having to confront the barriers that different languages pose. Whether it’s multilingual search, translation summarization, or automatic alignment of a transcript with video or audio, big data has to expand beyond single language capability in order to be able to understand what’s useful within that big data. There are several features of the product that I think are special. The first is that the translation technology that underlies the Omnifluent platform is really a true hybrid machine translation capability. It’s a combination of machine translation that includes rules-based and statistical engines, each of these engines working together as one within a single decision engine.”
He went on, “An even more interesting feature of this linguistic platform that Omnifluent has is that in addition to that hybrid nature of translation, it also unifies text as well as speech on a single platform. Omnifluent provides automatic speech recognition and machine translation in a hybrid approach that fuses all of these components together, sharing linguistic resources to avoid the problem of compounding errors that result from integrating different pieces technology. Since these sit on a single unified platform, they share all of the linguistic resources to provide the best possible output.”
Image: Courtesy SAIC
- Bringing Startups Center Stage at SemTechBiz 2013
- GraphLab Raises $6.75M to Build 'Hadoop for Graphs'
- The Latest Breakthroughs of E-Discovery
- Semantic Web Company and OpenLink Partner to Advance Enterprise Linked Data Integration