Mike Bergman recently shared a list of the top ten challenges facing IT over the last ten years and the amazing strides that have been made in each area. Bergman states that in the last ten years, “a whole slew of Grand Challenges in computing hung out there: tantalizing yet not proven. These areas ranged from information extraction and natural language understanding to speech recognition and automated reasoning. But things have been changing fast, and with a subtle steadiness that has caused it to go largely unremarked. Sure, all of us have been aware of the huge changes on the Web and search engine ubiquity and social networking. But some of the fundamentally hard problems in computing have also gone through some remarkable (but largely unremarked) advances.”

He continues, “These advances are perhaps not the realization of artificial intelligence as articulated in the 1950s to 1980s, but are contributing to a machine-based ability to do tasks useful to humans heretofore impossible and at scales unimaginable… The image that is emerging is less one of intelligent machines working autonomously than it is of computers working interactively or semi-automatically with humans to address previously unsolvable problems. By using a perspective of the decade past, we also demark the seminal paper on the semantic Web by Berners-Lee, Hendler and Lassila from May 2001. Yet, while this semantic Web vision has been a contributor to the success of the Grand Challenge advances of the past ten years, I think we can also say that it has not been the key or even a primary driver. That day may still yet come. Rather, I think we have to look to natural language and statistics surrounding large-scale corpora as the more telling drivers.”

The first of Bergman’s grand challenges is advances in information extraction: “Information extraction (IE) uses various forms of natural language processing (NLP) to identify structured information within unstructured or semi-structured
documents. These documents are presented in machine-readable form (including straight text, various document formats or HTML) with the various types of information ‘tagged’ or prompted for inclusion. Information types that can be extracted with one of the various techniques include entities, relations, topics, categories, and so forth. Once tagged or extracted, the information in the documents can now be included and linked to standard structured information (as might come from conventional databases) or to structure in other documents.”

Read the rest of the ten here.

Image: Courtesy Flickr/ dullhunk