Photo courtesy: Johan Hansson, https://www.flickr.com/photos/plastanka/

Photo courtesy: Johan Hansson, https://www.flickr.com/photos/plastanka/

A new report from the Securities Technology Analysis Center (STAC), Big Data Cases in Banking and Securities, looks to understand big data challenges specific to banking by studying 16 projects at 10 of the top global investment and retail banks.

According to the report, about half the cases involved e petabyte or more or data. That includes both natural language text and highly structured formats that themselves presented a great deal of variety (such as different departments using the same field for a different purpose or for the same purpose but using a different vocabulary) and therefore a challenge for integration in some cases. The analytic complexity of the workloads studied, the Intel-sponsored report notes, covered everything from basic transformations at the low end to machine learning at the high-end.

Authored by consultant Jennifer Costley and STAC founder and director Peter Lankford, the report defines big data as a workload that is too difficult or expensive to handle using traditional technologies, largely due to data scale or complexity. The workloads included card fraud detection, securities fraud early warning, enterprise credit risk reporting, tick analytics, social analytics for trading, trade visibility, archival of audit trails, customer data transformation, IT policy compliance analytics, IT operations analytics, and more. One interesting finding of the report was that most of the content related to these workloads was in highly structured formats.

Stacpix

The words volume, variety and velocity surround the Big Data discussion, but the report found velocity to be a main driver in only one use case for deploying new technology to handle the workload: social analytics for trading, in which historical social media content is used to develop indicator histories (such as sentiment) for use in trading algorithms. Indicators created from realtime social feeds enable use of those algorithms.

“The event‐driven pattern in this workload (as opposed to its historical analysis pattern) is all about maximizing the speed of semantic analysis on incoming textual data,” the report notes. “Too much latency between a market moving tweet and an action by a trading algorithm can turn profit to loss.”

About a quarter of the projects studied attribute their existence to an improved desire for agility, including accelerating the response times for analytics. “In business functions such as fraud detection and automated trading, the speed with which a firm can test new ideas (i.e., algorithms) has a direct impact on how quickly it can react to changing conditions in the external world,” it notes.

Among the technologies involved in solving big data challenges in finance, Hadoop plays a central role, according to the report. Time-sensitive areas like social analytics for trading leaned on Apache Spark and other distributed, in‐memory technologies. Other technologies that were often involved included NoSQL databases – among the use cases here was a graph database for an IT policy analytics deployment.

The STAC Benchmark Council undertook the research as part of its goal to develop technology benchmark standards based on big data workloads.