Next week Hadoop World takes place in New York City. The big event follows on the heels of the official gold release last week of Apache Hadoop 2.0, which significantly overhauls the MapReduce programming model for processing large data sets with a parallel, distributed algorithm on a cluster.
Sitting on top of the Hadoop Distributed File System (HDFS), YARN (Yet-Another-Resource-Negotiator) is meant to perform as a large-scale, distributed operating system for big data applications. Multiple apps can now run at the same time in Hadoop, with the global ResourceManager and NodeManager providing a generic system for managing the applications in a distributed way.
Among the YARN-ready applications is Apache Giraph, an iterative graph processing system built for high scalability – and the programming framework that helps Facebook with its Graph Search service of connections across friends, subscriptions, and so on, providing the means for it to express a wide range of graph algorithms in a simple way and scale them to massive datasets. Facebook explained in a post in August that it had modified and used Giraph to analyze a trillion edges, or connections between different entities, in under four minutes.