Synthesys, Digital Reasoning’s machine learning platform that ferrets out meaning in unstructured data at scale, is bringing its smarts to compliance use cases for organizations, such as financial institutions. (See this article for more insight into the technology behind the company’s software.)

This week, the vendor delivered Version 3.7 of the Synthesys software, which brings with it the capability to monitor and analyze all email communications in near real-time. That matters to many compliance program use cases, among them insider trading, money laundering and reputation management. “They all go back to finding information inside of communications, like who are the people and organizations mentioned in email, and what is being discussed about them,” says Tim Estes, chairman and CEO. “Synthesys can take essentially millions of emails and winnow them to maybe a hundred that are problems.”

That means fewer things are falsely flagged as issues, there’s less privacy treading into innocent emails, and there’s more return on time for the people charged with protecting customers and enforcing compliance requirements.

Estes says it’s become a C-level, and board-level, imperative in the financial industry to have “a deeper and more proactive solution on compliance, and one that can scale up to Big Data. That hasn’t existed.” A typical approach, when a concern presents itself, is to draw up keyword lists, which can get pretty massive, and hope you’re guessing all the ways people might say refer to something that might cross a compliance boundary, and be prepared to wade through a whole lot of emails that turn out to be harmless. (For example, Estes says, you might turn up emails referring to employees as assets, rather than the ones about an M&A that unfortunately blab about the potential acquiree’s assets.)  E-discovery tools, he adds, offer cool technology to help but are generally designed to analyze information in reaction to an issue.

“Our model is different. We come from the intelligence domain and don’t want to deal with missing something,” he says. Synthesys, which can search and understand structured and unstructured data to build a view of underlying entities (including different names that refer to the same thing), facts, relationships, and associated temporal and geospatial patterns, is designed to get ahead of the problem. Working from an upfront understanding of the risks or anomalies a company is looking for, Synthesys is there as every piece of email communications flows through an organization.

“It’s the equivalent of 1,000 analysts reading stuff all the time, but reading it privately and securely,” he says. “It learns its rules and if nothing gets flagged, then no one has to see it. When exceptions hit, it catches them, and then a human can be brought into the loop.”

An essential piece of the technology included with this version to enable these capabilities for the stream of email communications – as well as for other cases – is streaming ingestion. Synthesys is a Big Data system built on Hadoop, which is fundamentally geared to batch-processing and isn’t inherently appropriate for real-time analytics.  The open-source Storm distributed realtime computation system can handle large amounts of data in the stream. “We adopted it, so we can take messages one at a time as needed. The system has the knowledge now to keep up with the stream of data,” says Estes. “If you can’t analyze data in the stream, then write it out as soon as analysis is done, you can’t keep up with the stream of email.”

Digital Reasoning was able to add Storm to its ingestion toolkit because of its pluggable architecture. “Right now it supports Hadoop and Storm, so you can use two different systems to do ingestion. But it’s extensible, so a third or fourth system could be supported,” he says.

Also in this release, entity resolution has taken a big step forward. “We’ve found a way to scale up to billions of entities being considered and eventually trillions of references or pointers,” Estes says. Synthesys is now 20 times more effective at resolving entities. “It’s a pretty big sea change.” In the works for the company in the next year is “really making true the idea that every enterprise can have their own Knowledge Graph, their own Google, if you will, for their proprietary data,” says Estes.