Jennifer Zaino recently wrote an article for our sister website DATAVERSITY on the evolving field of NoSQL databases. Zaino wrote, “Hadoop Hbase. MongoDB. Cassandra. Couchbase. Neo4J. Riak. Those are just a few of the sprawling community of NoSQL databases, a category that originally sprang up in response to the internal needs of companies such as Google, Amazon, Facebook, LinkedIn, Yahoo and more – needs for better scalability, lower latency, greater flexibility, and a better price/performance ratio in an age of Big Data and Cloud computing. They come in many forms, from key-value stores to wide-column stores to data grids and document, graph, and object databases. And as a group – however still informally defined – NoSQL (considered by most to mean ‘not only SQL’) is growing fast. The worldwide NoSQL market is expected to reach $3.4 billion by 2018, growing at a CAGR of 21 percent between last year and 2018, according to Market Research Media. Read more
Posts Tagged ‘Hadoop’
Next week Hadoop World takes place in New York City. The big event follows on the heels of the official gold release last week of Apache Hadoop 2.0, which significantly overhauls the MapReduce programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
Sitting on top of the Hadoop Distributed File System (HDFS), YARN (Yet-Another-Resource-Negotiator) is meant to perform as a large-scale, distributed operating system for big data applications. Multiple apps can now run at the same time in Hadoop, with the global ResourceManager and NodeManager providing a generic system for managing the applications in a distributed way.
Among the YARN-ready applications is Apache Giraph, an iterative graph processing system built for high scalability – and the programming framework that helps Facebook with its Graph Search service of connections across friends, subscriptions, and so on, providing the means for it to express a wide range of graph algorithms in a simple way and scale them to massive datasets. Facebook explained in a post in August that it had modified and used Giraph to analyze a trillion edges, or connections between different entities, in under four minutes.
The enterprise version of Bottlenose has formally launched. Now dubbed Nerve Center, the service to provide real-time trend intelligence for brands and businesses, which The Semantic Web Blog previewed here, includes a dashboard featuring live visualization of all trending topics, hashtags and people, top positive and negative influences and sentiment trends, trending images, videos, links and popular messages, the ability to view trending messages by types (complaints vs. endorsements, for example) and real-time KPIs. As with its original service, Nerve Center leverages the company’s Sonar technology to automatically detect new topics and trends that matter to the enterprise.
“Broadly speaking, every large enterprise has to be doing social listening and social analytics,” CEO Nova Spivack told The Semantic Web Blog in an earlier interview, “including in realtime, which is one thing we specialize in. I don’t think any other product out there shows change as it happens as we do.” It’s important, he said, to understand that Bottlenose focuses on the discovery of trends, not just finding what users explicitly search for or track. Part of the release, he added, “will be some pretty powerful alerting to tell you when there is something to look at.”
Recent updates to YarcData’s software for its Urika analytics appliance reflect the fact that the enterprise is starting to understand the impact that semantic technology has on turning Big Data into actual insights.
The latest update includes integration with more enterprise data discovery tools, including the visualization and business intelligence tools Centrifuge Visual Network Analytics and TIBCO Spotfire, as well as those based on SPARQL and RDF, JDBC, JSON, and Apache Jena. The goal is to streamline the process of getting data in and then being able to provide connectivity to the tools analysts use every day.
As customers see the value of using the appliance to gain business insight, they want to be able to more tightly integrate this technology into wider enterprise workflows and infrastructures, says Ramesh Menon, YarcData vice president, solutions. “Not only do you want data from all different enterprise sources to flow into the appliance easily, but the value of results is enhanced tremendously if the insights and the ability to use those insights are more broadly distributed inside the enterprise,” he says. “Instead of having one analyst write queries on the appliance, 200 analysts can use the appliance without necessarily knowing a lot about the underlying, or semantic, technology. They are able to use the front end or discovery tools they use on daily basis, not have to leave that interface, and still get the benefit of the Ureka appliance.”
Nara, the service that to date has leveraged its neural networking technology to automate, personalize and curate web dining experiences for users, is making good on its previously-stated intentions to help users find and take action across various consumer lifestyle categories. (See our original story on the company here.)
The company today is adding personalized hotel recommendations to its portfolio. Consumers now will be able to find hotels conforming to their’ “Digital DNA” – the sum of what its technology learns of what they do and don’t like – in 50 high-volume cities in the U.S. and Canada. It’s entered into a non-exclusive partnership with Expedia to take care of booking on the back-end and TripAdvisor for its reviews, with both capabilities available to users without their having to leave the Nara site. The company expects to add additional locations in North America in the future, as it did for its restaurant recommendations.
Bottlenose earlier this month raised $3.6 million in Series A funding to help with its launch of Bottlenose Enterprise, the upcoming tool aimed at helping large companies discover and visualize trends from among a host of data sources, measuring and comparing them for those with the most “trendfluence.” Users will get a realtime dynamic view of change as it happens and a host of analytics for automating insights, the company says.
The Enterprise edition will be a big departure from the current Bottlenose Lite version for individual professionals. That difference starts with the amount of data it can handle. “The free, Lite version looks only at public API data like Twitter’s. The enterprise version uses the firehose,” says CEO Nova Spivack. Another big difference is that the enterprise version adds a lot more views and analytics, in comparison to the personal-use edition, where its Sonar technology provides the chief service of real-time detection of talk around topics personalized to users’ interests so they can visualize and track those topics over time.
Spivack calls what Enterprise does “enterprise-scale trend detection in the cloud,” leveraging a massive Hadoop infrastructure and technologies including Cassandra, MongoDB, and the Storm distributed realtime computation system to process data for deep dives. The cloud handles the computation, and results are shared at the edge, where certain kinds of analytics and visualizations occur locally in the browser for a realtime expience with no latency. With sources such as social streams, stock information, even a company’s proprietary data, and more, the Enterprise version helps brands discover important trends like keywords to bid on or viral content to share, who are their influencers and detractors, what sentiment and demographic movements are taking shape, and to create correlations across data points, too.
Synthesys, Digital Reasoning’s machine learning platform that ferrets out meaning in unstructured data at scale, is bringing its smarts to compliance use cases for organizations, such as financial institutions. (See this article for more insight into the technology behind the company’s software.)
This week, the vendor delivered Version 3.7 of the Synthesys software, which brings with it the capability to monitor and analyze all email communications in near real-time. That matters to many compliance program use cases, among them insider trading, money laundering and reputation management. “They all go back to finding information inside of communications, like who are the people and organizations mentioned in email, and what is being discussed about them,” says Tim Estes, chairman and CEO. “Synthesys can take essentially millions of emails and winnow them to maybe a hundred that are problems.”
That means fewer things are falsely flagged as issues, there’s less privacy treading into innocent emails, and there’s more return on time for the people charged with protecting customers and enforcing compliance requirements.
Derrick Harris of GigaOM reports, “According to Cloudera CEO Mike Olson, his company has ‘decades’ in front of it in which to enhance its Hadoop platform to become the go-to place for data storage and analysis. At a Tuesday event in San Francisco, Cloudera announced the latest feature meant to further that strategy — full-text search. It comes just weeks after the company’s Impala interactive SQL query engine became publicly available. The general idea behind adding search (something competitor MapR actually did in May), is to let people without deep technical skills find the information they need within a Hadoop cluster in a way that’s familiar to them. ‘You don’t even have to understand what SQL is. You can just type words into a box,’ Olson said during a recent phone call, comparing Cloudera’s search to the process of finding information online or within your Gmail history.” Read more
Derrick Harris of GigaOM reports, “WibiData — the big data startup from Cloudera Co-founder Christophe Bisciglia and Aaron Kimball — doesn’t have overly big plans. It only wants to become one of the first, if not the first, company selling off-the-shelf software that lets other companies build valuable, customer-facing applications on Hadoop. On Thursday, WibiData announced $15 million in Series B funding from Canaan Partners, as well as existing investors NEA and Google Chairman Eric Schmidt, to help make the goal a reality.” Read more
Robin Wauters of The Next Web reports, “Seattle startup GraphLab claims it is building the ‘fastest machine-learning analytics engine for graph datasets’, based on the popular open-source distributed graph computation framework with the same name, and it has just raised capital to come through on its promise. Founded by scientists from the University of Washington, Carnegie Mellon and UC Berkeley, GraphLab today announced that it has secured $6.75 million in a financing round led by Madrona and NEA.” Read more
NEXT PAGE >>