[NOTE: This guest post is by Peter Haase, Lead Architect for Research and Development, fluid Operations.]
Industry engineers waste a significant amount of time searching for data that they require for their core tasks. When informed about potential problems, diagnosis engineers at Siemens Energy Services, an integrated business unit which runs service centers for power plants, need to access several terabytes of time-stamped sensor data and several gigabytes of event data, including both raw and processed data. These engineers have to respond to about 1,000 service requests per center per year, and end up spending 80% of their time on data gathering alone. What makes this problem even worse is that their data grows at a rate of 30 gigabytes per day. Similarly, at Statoil Exploration, geology and geographic experts spend between 30 and 70% of their time looking for and assessing the quality of some 1,000 terabytes of relational data using diverse schemata and spread over 2,000 tables and multiple individual databases . In such scenarios, it may take several days to formulate the queries that satisfy the information needs of the experts, typically involving the assistance of experienced IT experts who have been working with the database schemata for years.
Siemens and Statoil Exploration are hardly the only companies faced with time-wasting Big Data issues, but the root of these issues is not simply the “big” aspect of their data. The real challenge is finding a way to efficiently and effectively mine data for value and insight, regardless of its volume.