Posts Tagged ‘Hadoop’

GraphLab Create Aims To Be The Complete Package For Data Scientists

glabData scientists can add another tool to their toolset today: GraphLab has launched GraphLab Create 1.0, which bundles up everything starting from tools for data cleaning and engineering through to state-of-the-art machine learning and predictive analytics capabilities.

Think of it, company execs say, as the single platform that data scientists or engineers can leverage to unleash their creativity in building new data products, enabling them to write code at scale on their own laptops. The driving concept behind the solution, they say, is to make large-scale machine learning and predictive analytics easy enough that companies won’t have to hire huge teams of data scientists and engineers and build the big hardware infrastructures that lie behind many of today’s Big Data-intensive products. And, the data scientists and engineers that do use it won’t need to be experts at machine-learning algorithms – just experienced enough to write Python code.

Read more

Semantic Web Job: Big Data Architect

TekTree Systems LogoNew York’s Tektree Systems is in need of a Big Data Architect. The job description states, “Hadoop Data Architect with both hands-on Big Data and relational experience and deep knowledge of physical data modeling, data organization and storage technology, experienced with high volumes and able to architect and implement multi-tier solutions using the right technology in each tier, based on fit. Required Skills and Qualifications:

  • Design  and development of data models for a new HDFS Master Data Reservoir and one or more relational or object Current Data environments
  • Design of optimum storage allocation for the data stores in the architecture.
  • Development of data frameworks for code implementation and testing across the program
  • Knowledge and experience with RDF and other Semantic technologies
  • Participation in code reviews to assure that developed and tested code conforms with the design and architecture principles
  • QA and testing of modules/applications/interfaces.
  • End-to-End project experience through to completion and supervise turnover to Operations staff.
  • Preparation of documentation of data architecture, designs and implemented code”.

Read more

Big Data Challenges In Banking And Securities

Photo courtesy: Johan Hansson, https://www.flickr.com/photos/plastanka/

Photo courtesy: Johan Hansson, https://www.flickr.com/photos/plastanka/

A new report from the Securities Technology Analysis Center (STAC), Big Data Cases in Banking and Securities, looks to understand big data challenges specific to banking by studying 16 projects at 10 of the top global investment and retail banks.

According to the report, about half the cases involved e petabyte or more or data. That includes both natural language text and highly structured formats that themselves presented a great deal of variety (such as different departments using the same field for a different purpose or for the same purpose but using a different vocabulary) and therefore a challenge for integration in some cases. The analytic complexity of the workloads studied, the Intel-sponsored report notes, covered everything from basic transformations at the low end to machine learning at the high-end.

Read more

Additional Funding For Elasticsearch To Help Company Complement Its RealTime Search And Analytics Stack

elasticsearchlogoElasticsearch – whose Elasticsearch, Logstash and Kibana products for discovering and extracting insights from structured and unstructured data were discussed earlier this year here – has raised $70 million in Series C financing from New Enterprise Associates (NEA). Benchmark Capital and Index Ventures also participated in the round. That brings the total to $104 million over the past 18 months.

“Nearly all companies, start-ups and Fortune 500 enterprises alike, need to be able to slice and dice rapidly expanding data volumes in real time,” says Steven Schuurman, co-founder and CEO. The funding, Schuurman says, will be applied to enhancing sales, marketing and support personnel and efforts, as well as investing in development to build more complementary products that work with the ELK stack.

“Ultimately, this round of funding will help us get to our goal, faster, of making the ELK stack the de facto platform for businesses to gain actionable insights from their data,” he says.

Read more

Skytree Supports Big Data Analytics in Hadoop With Hortonworks Data Platform

sky

SAN JOSE, CA–(Marketwired – Jun 3, 2014) - Skytree®, the Machine Learning Company®, today announced that its predictive analytics software is now available on Apache Hadoop YARN to deliver agile analytics on Hadoop clusters. Skytree’s flagship product — Skytree Server® — is built to provide high-performance Machine Learning and takes advantage of the multi-workload capabilities enabled by YARN’s increased reliability, scalability and manageability. Read more

Gartner Uncovers Who’s Cool In The Supply Chain

Photo courtesy: Flickr/a loves dc

Photo courtesy: Flickr/a loves dc

Gartner recently released its report dubbed, “Cool Vendors in Supply Chain Services,” which gives kudos to providers that use cloud computing as an enabler or delivery mechanism for capabilities that help enterprises to better manage their supply chains.

On that list of vendors building cloud solutions and leveraging big data and analytics to optimize the supply chain is startup Elementum, which The Semantic Web Blog initially covered here and which envisions the supply chain as a complex graph of connections. As we reported previously, Elementum’s back-end is based on a real-time Java, MongoDB NoSQL document database and flexible schema graph database to store and map the nodes and edges of a supply chain graph. A URI is used for identifying data resources and metadata, and a federated platform query language makes it possible to access multiple types of data using that URI, regardless of what type of database it is stored in. Mobile apps provide end users access to managing transportation networks, respond to supply chain risks, and monitor the health of the supply chain.

Gartner analyst Michael Dominy writes in the report that Elementum earns its cool designation in part for its exploitation of Gartner’s Nexus of Forces, which the research firm describes as the convergence and mutual reinforcement of social, mobility, cloud and information patterns that drive new business scenarios.

Read more

Let Your Enterprise Graph Tell You A Story

entgrafEvery picture tells a story, don’t it? Well, turns out that’s true in the enterprise as much as on our Facebook pages. In this case, the picture is the enterprise graph of the workforce – who interacts with whom, when, in what context. And the story is what the patterns of interactions revealed by the graph may say about employee engagement, influence, and how to better leverage all that to the business’ – and the employees’ — benefit.

When Marie Wallace, IBM analytics strategist, looks at social and collaborative networks and other sources of enterprise communications and channels for business processes, such as CRM systems, “I am interested in the narrative,” she told an audience at the Sentiment Analytics Symposium earlier this month. “There is a lot of information in CRM systems – who met with whom, what industry the client is in, what products were presented. All this is valuable and contributes to the enterprise graph.”

Read more

NoSQL’s Data Modeling Advantages

4708481750_40fe48efa7_z

Jennifer Zaino recently wrote an article for our sister website DATAVERSITY on the evolving field of NoSQL databases. Zaino wrote, “Hadoop Hbase. MongoDB. Cassandra. Couchbase. Neo4J. Riak. Those are just a few of the sprawling community of NoSQL databases, a category that originally sprang up in response to the internal needs of companies such as Google, Amazon, Facebook, LinkedIn, Yahoo and more – needs for better scalability, lower latency, greater flexibility, and a better price/performance ratio in an age of Big Data and Cloud computing. They come in many forms, from key-value stores to wide-column stores to data grids and document, graph, and object databases. And as a group – however still informally defined – NoSQL (considered by most to mean ‘not only SQL’) is growing fast. The worldwide NoSQL market is expected to reach $3.4 billion by 2018, growing at a CAGR of 21 percent between last year and 2018, according to Market Research Media. Read more

Hadoop Highlights: YARN-Ready Apps And Machine Learning Magic

rsz_hadoopNext week Hadoop World takes place in New York City. The big event follows on the heels of the official gold release last week of Apache Hadoop 2.0, which significantly overhauls the MapReduce programming model for processing large data sets with a parallel, distributed algorithm on a cluster.

Sitting on top of the Hadoop Distributed File System (HDFS), YARN (Yet-Another-Resource-Negotiator) is meant to perform as a large-scale, distributed operating system for big data applications. Multiple apps can now run at the same time in Hadoop, with the global ResourceManager and NodeManager providing a generic system for managing the applications in a distributed way.

Among the YARN-ready applications is Apache Giraph, an iterative graph processing system built for high scalability – and the programming framework that helps Facebook with its Graph Search service of connections across friends, subscriptions, and so on, providing the means for it to express a wide range of graph algorithms in a simple way and scale them to massive datasets. Facebook explained in a post in August that it had modified and used Giraph to analyze a trillion edges, or connections between different entities, in under four minutes.

Read more

Bottlenose Nerve Center Debuts, Bringing The Artificial Analyst To The Enterprise

rsz_botnosenewThe enterprise version of Bottlenose has formally launched. Now dubbed Nerve Center, the service to provide real-time trend intelligence for brands and businesses, which The Semantic Web Blog previewed here, includes a dashboard featuring live visualization of all trending topics, hashtags and people, top positive and negative influences and sentiment trends, trending images, videos, links and popular messages, the ability to view trending messages by types (complaints vs. endorsements, for example) and real-time KPIs. As with its original service, Nerve Center leverages the company’s Sonar technology to automatically detect new topics and trends that matter to the enterprise.

“Broadly speaking, every large enterprise has to be doing social listening and social analytics,” CEO Nova Spivack told The Semantic Web Blog in an earlier interview, “including in realtime, which is one thing we specialize in. I don’t think any other product out there shows change as it happens as we do.” It’s important, he said, to understand that Bottlenose focuses on the discovery of trends, not just finding what users explicitly search for or track. Part of the release, he added, “will be some pretty powerful alerting to tell you when there is something to look at.”

Read more

NEXT PAGE >>