Make it as easy to add and connect new data sources into the enterprise analytics infrastructure as it is to add a new web site onto the modern web. That’s where next-gen data curation company Tamr, a startup born from an MIT research project to bring together lots of tabular data sources in a scalable and repeatable way.
Just like Google does all the work to find and connect web sites hosting the information that users want, “we want to do the same with tabular data sources inside the enterprise,” says Tamr co-founder and CEO Andy Palmer. “Tamr provides systems of reference. If you are looking for attributes to add to an analysis or want data to support something, you have this reference place to go in the enterprise with a catalogue of all the data that exists across the company.”
So often businesses want to use analytics to address hard questions, but can’t do so successfully unless they are integrating lots of disparate data sources and creating a referential catalog. With Tamr, Palmer says, they can ingest data sources very quickly into a semantic triple store, make them available in real time, and connect them using machine learning to map attributes and match records, in support of providing a unified view of a given entity that can then be consumed by various business intelligence and analytics tools. To be useable, he points out, data has to be “very, very thoroughly connected into everything else for there to be context and reference for how it can be consumed and whether it is reliable.”