27498728_b84f8817aaChloe Green of Information Age recently wrote, “Handling immense data sets requires a combination of scientific and technological skills to determine how data is stored, searched and accessed. In science, the importance of data scientists in ensuring that data is handled correctly from the outset is not underestimated; other industries can learn from the scientific approach. Text-mining tools and the use of relevant taxonomies are essential. If we think about big data as a huge number of data points in some multi-dimensional space, the problem is one of analysis, i.e. frequently finding very similar or very dissimilar points which cannot be compared. In life sciences, taxonomies assign data points a class, thus comparison of two points is as easy as looking up other data points in the same class.”

Green continues, “Without taxonomies, the only way to find data points comparable to the one of interest to compute the distance of this point to every other point in the space, which is a huge number of computations. Taxonomies provide enormous speed for big data analysis.Taxonomies combined with semantic technology and text-mining tools provide a more efficient way to discover the relevant content. Text-mining has generally been a largely manual process, but recent advances in technology such as text-mining automation have transformed the process. Technology has been able to play an important role in complementing the human element of text-mining and curation.”

She goes on, “Text-mining extracts key data from multiple and disparate data sources; the resulting nuggets of data have enormous value and can help users make associations where none existed before. By investing in the right technology, businesses can ensure their employees are able to get the most value from the data available to them.”

Read more here.

Image: Courtesy Flickr/ Sick Sad M!kE