Jennifer Zaino recently wrote an article for our sister website DATAVERSITY on the evolving field of NoSQL databases. Zaino wrote, “Hadoop Hbase. MongoDB. Cassandra. Couchbase. Neo4J. Riak. Those are just a few of the sprawling community of NoSQL databases, a category that originally sprang up in response to the internal needs of companies such as Google, Amazon, Facebook, LinkedIn, Yahoo and more – needs for better scalability, lower latency, greater flexibility, and a better price/performance ratio in an age of Big Data and Cloud computing. They come in many forms, from key-value stores to wide-column stores to data grids and document, graph, and object databases. And as a group – however still informally defined – NoSQL (considered by most to mean ‘not only SQL’) is growing fast. The worldwide NoSQL market is expected to reach $3.4 billion by 2018, growing at a CAGR of 21 percent between last year and 2018, according to Market Research Media. Read more
Last week news came from SindiceTech about the availability of its SindiceTech Freebase Distribution for the cloud (see our story here). SindiceTech has finalized its separation from the university setting in which it incubated, the former DERI institute, now a part of the Insight Center for Data Analytics, and now is re-launching its activities, with more new solutions and capabilities on the way.
“The first thing was to launch the Knowledge Graph distribution in the cloud,” says CEO Giovanni Tummarello. “The Freebase distribution showcases how it is possible to quickly have a really large Knowledge Graph in one’s own private cloud space.” The distribution comes instrumented with some of the tools SindiceTech has developed to help users both understand and make use of the data, he says, noting that “the idea of the Knowledge Graph is to have a data integration space that makes it very simple to add new information, but all that power is at risk of being lost without the tools to understand what is in the Knowledge Graph.”
Included in the first round of the distribution’s tools for composing queries and understanding the data as a whole are the Data Types Explorer (in both tabular and graph versions), and the Assisted SPARQL Query Editor. The next releases will increase the number of tools and provide updated data. “Among the tools expected is an advanced Knowledge Graph entity search system based on our newly released SIREn search system,” he says.
Jeff Bertolucci of Information Week reports, “Computers do many things faster and more efficiently than the human brain, but they’re decidedly inferior when it comes to extracting meaning from human language. As BigData-Startups.com founder Mark van Rijmenam writes in a recent blog post, the key stumbling block here is that computers understand ‘unambiguous and highly structured’ programming language, while human language is a minefield of nuance, emotion, and implied intent. Van Rijmenam also quotes a Chronicle of Higher Education post by Geoffrey Pullum, a professor of general linguistics at the University of Edinburgh. Pullum outlines three prerequisites for computers to master human language: ‘First, enough syntax to uniquely identify the sentence; second, enough semantics to extract its literal meaning; and third, enough pragmatics to infer the intent behind the utterance, and thus discerning what should be done or assumed given that it was uttered.’ ” Read more
ElasticSearch 1.0 launches today, combining Elasticsearch realtime search and analytics, Logstash (which helps you take logs and other event data from your systems and store them in a central place), and Kibana (for graphing and analyzing logs) in an end-to-end stack designed to be a complete platform for data interaction. This first major update of the solution that delivers actionable insights in real-time from almost any type of structured and unstructured data source follows on the heels of the release of the commercial monitoring solution Elasticsearch Marvel, which gives users insight into the health of Elasticsearch clusters.
Organizations from Wikimedia to Netflix to Facebook today take advantage of Elasticsearch, which vp of engineering Kevin Kluge says is distinguished by its focus from its open-source start four years ago on realtime search in a distributed fashion. The native JSON and RESTful search tool “has intelligence where when it gets a new field that it hasn’t seen before, it discerns from the content of the field what type of data it is,” he explains. Users can optionally define schemas if they want, or be more freeform and very quickly add new styles of data and still profit from easier management and administration, he says.
Models also exist for using JSON-LD to represent RDF in a manner that can be indexed by Elasticsearch. The BBC World Service Archive prototype, in fact, uses an index based on ElasticSearch and constructed from the RDF data held in a central triple store to make sure its search engine and aggregation pages are quick enough.
PHILADELPHIA, Feb. 4, 2014 /PRNewswire/ — The Intellectual Property & Science business of Thomson Reuters, the world’s leading provider of intelligent information for businesses and professionals, today announced the launch of Cortellis™ Data Fusion, an addition to the Thomson Reuters Cortellis suite, the industry’s most comprehensive information solution for drug discovery and development. Cortellis Data Fusion utilizes linked data technologies – frameworks that allow content to be shared across applications and enterprise or community boundaries – connecting users with data from internal proprietary systems as well as third-party resources to address Big Data challenges. Read more
Startup Elementum wants to take supply chains into the 21st century. Incubated at Flextronics, the second largest contract manufacturer in the world, and launching today with $44 million in Series B funding from that company and Lightspeed Ventures, its approach is to get supply chain participants – the OEMs that generate product ideas and designs, the contract manufacturers who build to those specs, the component makers who supply the ingredients to make the product, the various logistics hubs to move finished product to market, and the retail customer – to drop the one-off relational database integrations and instead see the supply chain fundamentally as a complex graph or web of connections.
“It’s no different thematically from how Facebook thinks of its social network or how LinkedIn thinks of what it calls the economic graph,” says Tyler Ziemann, head of growth at Elementum. Built on Amazon Web Services, Elementum’s “mobile-first” apps for real-time visibility, shipment tracking and carrier management, risk monitoring and mitigation, and order collaboration have a back-end built to consume and make sense of both structured and unstructured data on-the-fly, based on a real-time Java, MongoDB NoSQL document database to scale in a simple and less expensive way across a global supply chain that fundamentally involves many trillions of records, and flexible schema graph database to store and map the nodes and edges of the supply chain graph.
“Relational database systems can’t scale to support the types of data volumes we need and the flexibility that is required for modeling the supply chain as a graph,” Ziemann says.
Scott Raynovich of CMS Wire recently wrote, “Boston Dynamics, Nest and DeepMind. In the past month, Google has gone on yet another acquisition binge, spending at least $4 billion on a trio of startups that seem only loosely connected — robotics, home automation and artificial intelligence, respectively. Is there a central strategy, and what does it mean to the future of Google, the Internet of Things and Customer Experience? Based on a pattern of deals and feedback from leading experts, it appears Google believes the future is heavily connected to data gathering, machine learning and automation, which all of these companies have in common. ‘In a broader pattern, if Google is focusing on artificial intelligence (AI) and machine learning, how is this kind of semantic understanding going to help us make decisions faster and do our jobs,’ said David Schubmehl, a research director with International Data Corp. (IDC).” Read more
SAN JUAN CAPISTRANO, CA–(Marketwired – Jan 28, 2014) - Predixion Software, a developer of collaborative predictive analytics software solutions, today announced it is providing academic institutions with the free use of its advanced analytics software and related training materials for students and teachers of data science. The Predixion in the Classroom (PIC) program provides future data analysts with hands-on experience building predictive models and extracting valuable insights from large data sets. The PIC program is intended to address the extreme shortage of data scientists, which will grow more severe in coming years as organizations increasingly rely on big data and predictive analytics to guide decisions and improve operations. Academic institutions such as the University of Washington, University of Western Michigan, St. Joseph University, and the University of Maryland University College (UMUC) are already taking advantage of the program to help their students gain the essential skills required for data analysis. Read more
What’s next for the capital markets arena when it comes to unstructured content? According to research and consulting firm TABB Group, which specializes in the stock, bond and money markets, it’s time to turn text analytics to internally generated and disseminated unstructured data, which holds a high value for customized intelligence.
In new research, “Inner Voices: Harvesting Text Analytics from Proprietary Data,” research analyst Valerie Bogard and senior analyst Paul Rowady discuss that there are more use cases than initially undertaken for text analytics tools. “Although ultra-low latency trading strategies were an early use case in this space, text analytics is no longer limited to just that,” Bogard said in an email to The Semantic Web Blog. “The use of machine readable news has been widely adopted and all major market data providers incorporate market moving news content into their feeds.”