Posts Tagged ‘data provenance’

Financial Execs Worry About Data Lineage; Triple Stores Can Calm Fears

 

Photo courtesy: Flickr/ FilterForge

Photo courtesy: Flickr/ FilterForge

The Aite Group, which provides research and consulting services to the international financial services market, spends its fair share of time exploring the data and analytics challenges the industry faces. Senior analyst Virginie O’Shea commented on many of them during a webinar this week sponsored by enterprise NoSQL vendor MarkLogic.

Dealing with multiple data feeds from a variety of systems; feeding information to hundreds of end users with different priorities about what they need to see and how they need to see it; a lack of a common internal taxonomy across the organization that would enable a single identifier for particular data items; the toll ETL, cleansing, and reconciliation can take on agile data delivery; the limitations in cross-referencing and linking instruments and data to other data that exact a price on data governance and quality – they all factor into the picture she sketched out.

Read more

Making the Case for Semantic Tech in the Financial Sector

Wall Street

Amir Halfon of Marklogic recently discussed the ways that semantic technologies can create value in the financial sector, among other industries. One such way is through data provenance: “Due to the increased focus on data governance and regulatory compliance in recent years, there’s a growing need to capture the provenance and lineage of data as it goes through its various transformation and changes throughout its lifecycle. Semantic triples provide an excellent mechanism for capturing this information right along with the data it describes. A record representing a trade for instance, can be ‘decorated’ with information about the source of the different elements within it (e.g.: Cash Flow -> wasAttributedTo -> System 123). And this information can be continuously updated as the trade record changes over time, again without the constraints of a schema, which would have made this impossible.” Read more

Hadoop Meets Semantic Technology: Data Scientists Win

Hadoop is on almost every enterprise’s radar – even if they’re not yet actively engaged with the platform and its advantages for Big Data efforts. Analyst firm IDC earlier this year said the market for software related to the Hadoop and MapReduce programming frameworks for large-scale data analysis will have a compound annual growth rate of more than sixty percent between 2011 and 2016, rising from $77 million to more than $812 million.

Yet, challenges remain to leveraging all the possibilities of Hadoop, an Apache Software Foundation open source project, especially as it relates to empowering the data scientist. Hadoop is composed of two sub-projects: HDFS, a distributed file system built on a cluster of commodity hardware so that data stored in any node can be shared across all the servers, and the MapReduce framework for processing the data stored in those files.

Semantic technology can help solve many of the  challenges, Michael A. Lang Jr., VP, Director of Ontology Engineering Services at Revelytix, Inc., told an audience gathered at the Semantic Technology & Business Conference in New York City yesterday.

Read more