Speech Recognition and Speech Understanding

The digital revolution is responsible for igniting hundreds of smaller revolutions which have ultimately transformed technology and communications, creating one of the most influential and life changing events that humanity has experienced in current times, and possibly throughout its history. These revolutions continue to reshape our cell phones, computers, cars, even the Internet, profoundly changing the way we view ourselves and the world.

One of these revolutions likely to experience the greatest number of changes is Speech Recognition.  Speech recognition stands as a medium, or a communications Ambassador between machines and people, ever promising to deliver “natural speech.”  As a result, Speech Recognition has the ability of unifying or incorporating many other current technologies, ultimately fusing many features and functions granted by today’s technologies.

In the past few years, Speech Recognition has experienced many improvements that have enabled its machines, specifically computers, to perform elaborate tasks such as Dictation, and Command Recognition.  Thanks to superior ways to capture and analyze sound files, even personal accents and speech impediments can now be taken into account.  As Speech Recognition improves and advances, it is inevitably poised to directly compete with an old technology warrior, the keyboard. 

Considered an essential and integral part in most computers, the faithful keyboard, which found its introduction in typewriters in the 17th century, has represented the principal means for humans to communicate with machines for the greater part of technology’s history.  Nonetheless, the keyboard and its means to communicate, typing, are not natural ways for people to convey information and ideas.  The primarily vehicle for people to communicate is in fact language, or better said, speech itself.  Therefore, the very moment that machines and humans can effectively communicate naturally, the keyboard will inevitable find a secondary roll as a communications Ambassador.  The promise of naturally speaking and naturally listening machines is an ever growing and exciting reality.

Nevertheless, like all evolving technologies throughout history, Speech Recognition faces many of the same obstacles that challenge other communication technologies such as search engines, translation software, spell check, etc.  These obstacles are typically rooted in the Semantics of the information itself, or more specifically in the case of Speech Recognition, in understanding the true meaning of the spoken words.  For example, if a speaker says: “Where is my cell?” dictation software can potentially write two different statements:

  1. Where is my sell, and
  2. Where is my cell

Obviously, from the two possible statements above, only Option 2 (cell) will be correct, since Option 1 (sell) will make no rational or intelligent sense.

Natural language (human language) is extensively populated and affected by what may be described as “irregular words.”  These irregular words normally don’t follow the standard rules of their intended meanings.  For example:

  • Homophones are words that sound identical but have different written forms or spelling (i.e., sell and cell),
  • Synonyms are words that although written differently, they still identify the same meaning (i.e., dog and pooch),
  • Homonyms are identical words that have different meanings (i.e., Cell: the basic element forming tissue, and Cell: the telephonic apparatus/cell phone,
  • Idioms and phrases (i.e., no way and wow!),
  • Collocations which are words normally found or fused together many times identifying a complete different meaning (i.e., hot dog is neither hot, nor an animal yet identifies a type of sausage).

In view of the possible misinterpretations that natural language bestows, scientific fields such as Natural Language Processing (NLP) and Artificial Intelligence (AI) are increasingly being incorporated into Speech Recognition in hopes of overcoming this language paradox.  NLP and AI sciences are simultaneously growing, evolving and experiencing many changes and improvements.  Within AI there are many new disciplines constantly reshaping science as we know it.  For example, in late December of 2008, in a new AI experiment implementing one of these new intelligent disciplines, a system was fed the above spoken sentence (Where is my cell).  In the experiment, the system effectively chose Option 2 (cell over sell).  But furthermore, it also reflected the ability of understanding or differentiating between:

A) Where is my cell (wherein cell identifies the basic element forming living tissue),
B) Where is my cell (wherein cell identifies a cell phone),
C) Where is my cell (wherein cell identifies a group).

Ultimately, allowing the system to choose Option B (cell the apparatus) and Option C (the group).  In fact, both questions would be naturally correct, further enabling the system to assume the most common definition for cell (the phone), and/or better yet, challenge the realistic ambiguity of the questions by now asking the speaker: -did you meant cell for phone or did you meant cell for group?

As sciences evolve so will technologies, and so will the benefits that we all enjoy in our every day lives. The future promises faster, better, cheaper and more intelligent machines, and Speech Recognition will be at the forefront, bringing together people and machines, simplifying our lives and fulfilling our needs. 

Semantic Tech & Business Conference Returns to San Francisco

Semantic Tech & Business Conference returns to San Francisco in June! Join us from June 3-7 for complete coverage of Big Data, Linked Data, Extreme Information Management, and Semantic Web. From breakthrough approaches to solving business problems to the big data implications of fast–evolving technologies, SemTechBiz provides you with an unparalleled interactive experience and delivers tangible business value. We're offering a special early rate when you register by February 17. Sign up now!