Aaron Dobrow of the Texas Advanced Computing Center at the University of Texas recently wrote, “Language isn’t always straightforward, even for humans. The multiple definitions in a dictionary can make it difficult even for people to choose the correct meaning of a word. Katrin Erk, a linguistics researcher in the College of Liberal Arts, refers to this as ‘semantic muck.’ Enabled by supercomputers at the Texas Advanced Computing Center, Erk has developed a new method for visualizing the words in a high-dimensional space. Instead of hard-coding human logic or deciphering dictionaries to try to teach computers language, Erk decided to try a different tactic: feed computers a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a map of relationships.”

Erk commented, “An intuition for me was that you could visualize the different meanings of a word as points in space… You could think of them as sometimes far apart, like a battery charge and criminal charges, and sometimes close together, like criminal charges and accusations (‘the newspaper published charges…’). The meaning of a word in a particular context is a point in this space. Then we don’t have to say how many senses a word has. Instead we say: ‘This use of the word is close to this usage in another sentence, but far away from the third use.’ ”

Read more here, or watch this video about Erk’s work:

Image: Courtesy University of Texas