Jennifer Zaino recently wrote an article for our sister website DATAVERSITY on the evolving field of NoSQL databases. Zaino wrote, “Hadoop Hbase. MongoDB. Cassandra. Couchbase. Neo4J. Riak. Those are just a few of the sprawling community of NoSQL databases, a category that originally sprang up in response to the internal needs of companies such as Google, Amazon, Facebook, LinkedIn, Yahoo and more – needs for better scalability, lower latency, greater flexibility, and a better price/performance ratio in an age of Big Data and Cloud computing. They come in many forms, from key-value stores to wide-column stores to data grids and document, graph, and object databases. And as a group – however still informally defined – NoSQL (considered by most to mean ‘not only SQL’) is growing fast. The worldwide NoSQL market is expected to reach $3.4 billion by 2018, growing at a CAGR of 21 percent between last year and 2018, according to Market Research Media. Read more
Posts Tagged ‘data modeling’
Andy Flint of CloudTech recently wrote, “Analytics depends on data — the more, the merrier. If we’re trying to model, say, the behaviour of customers responding to marketing offers or clicking through a website, we can build a far stronger model with 10,000 samples than with 100. You would think, then, that the rise of Big Data and its seemingly inexhaustible supply of data would be every analyst’s dream. But Big Data poses its own challenges for modeling. Much of Big Data isn’t what we have historically thought of as ‘data’ at all. In fact, 80% of Big Data is raw, unstructured information, such as text, and doesn’t neatly fit into the columns and rows that feed most modeling programs.” Read more
Gary Hamilton of GovHealthIT recently wrote, “Today, the acquisition of patient information for population health management is typically done through Continuity of Care Documents (CCDs). Although the exchange of health information is possible via CCDs, the amount of information they contain can be overwhelming. As such, poring over CCDs to find information relevant to patient populations can be unwieldy and time consuming. With providers challenged to manage information in just one CCD, how can they hope to use these documents to effectively influence care at the population level? The key is to look for ways to use technology to target specific patient information, pinpoint new and relevant information and alert both patients and providers when updated information is available.” Read more
Solidus is looking for a Software Engineer -Data Modeling in Lexington, MA. According to the post, “Solidus is searching for a Software Engineer with very good data modeling skills with UML, great Java and XML programming skills and a strong desire to learn new technologies. The candidate will develop, integrate and test solutions for several inter-related projects. Work will include developing advanced and robust semantic web applications and services. Candidate will work with project teams to produce, test, deliver and support code for research initiatives, internal project collaborations and external project deliverables in operationally relevant environments.” Read more
Simon Rogers of The Guardian recently reviewed what the publication learned about data journalism after covering the London Olympics. Rogers writes, “There was never a guarantee that it would amount to anything for us. The Olympics may have been the only news story in town last week and would undoubtedly produce great journalism, but would it result in data journalism? At its essence, this is the gathering of stories from data. It’s more than just producing a few charts – data visualisation is often the expression of data journalism, but the process of digging through the data to find the stories that matter, that is at its heart.”
He goes on, “At some levels the omens were not good. The key results data is locked up in lucrative deals between the International Olympic Committee and major news organisations. So, those results tables on our site, the BBC, The Telegraph and so on were paid – The Guardian’s is a feed from the New York Times and we were explicitly banned from releasing that feed as open data for you to download and explore with. As I wrote earlier, while it was not the first Open Data Olympics – it was arguably the first data Olympics. So, what can an open data journalism site do in that situation? This is what we learned.”
Image: Courtesy The Guardian
Andrew Phelps reports, “Data nerds from government and academia gathered Friday at Northeastern University to show off the latest version of Weave, an open-source, web-based platform designed to visualize ‘any available data by anyone for any purpose.’ The software has a lot of potential for journalists. Weave is supported by the Open Indicators Consortium, an unusual partnership of planning agencies and universities who wanted better tools to inform public policy and community decision-making. The groups organized and agreed to share data and code in 2008, well before Gov 2.0 was hot.” Read more
With the Web 2.0, ontologies are being used to improve search capabilities and make inferences for improved human or computer reasoning. By relating terms in an ontology, the user doesn’t need to know the exact term actually stored in the document. Data Rationalization is a Managed Meta Data Environment (MME) enabled application which creates/extends an ontology for a domain into the structured data world, based on model objects stored in various models (of varying levels of detail, across model files and modeling tools) and other meta data. Ontology is “the study of the categories of things that exist or may exist in some domain”1. An ontology is comprised of “a collection of taxonomies and thesauri”2 about a domain. Data Models, often unknowingly, express many aspects of ontology, even though they are not stored in OWL or RDF.
Master Data Management is now mainstream and those of us who have practiced it for a few years are battered, bruised and wearily displaying our scars. Typically defined as the people, processes and systems that govern the core data (e.g. products, customers, suppliers) needed to run a business, Master Data Management (or MDM) requires painstaking work in three broad areas: data standardization, architecture, and governance:
I came across this in the Ontolog-Forum, and got permission from Mike Bennett, the author, to quote it. Some great pragmatism here: on the one hand, we need to apply semantics and ontology building to systems development and integration, but at the same time, let’s not get too carried away trying to "model reality."
NEXT PAGE >>