The top five most powerful Hadoop projects

Alex Handy
June 1, 2011 —  (Page 1 of 5)
Running Hadoop jobs and keeping a cluster up is a full time administrator job. Setting up jobs and parceling them out to the cluster can be difficult, especially if you're building multiple applications that rely on Hadoop inputs and outputs. Stitching these external applications together can be a pain, and bringing their needs together inside of map/reduce jobs can make things even more complicated.

We'll let the Cascading project describe itself here:

As a library and API that can be driven from any JVM-based language (Jython, JRuby, Groovy, Clojure, etc.), developers can create applications and frameworks that are "operationalized". That is, a single deployable JAR can be used to encapsulate a series of complex and dynamic processes all driven from the command line or a shell, instead of using external schedulers to glue many individual applications together with XML against each individual command line interface.

Not only does Cascading abstract away from map/reduce, it also allows developers to deploy their jobs in a much simpler way. That being done, cascading allows developers to build applications on top of Hadoop with a much simpler design model.

The cascading tree also includes a number of tools based on the system. One of those tools, dubbed the “Multitool,” even allows developers to run grep and sed jobs right against the cluster's data set.

Related Search Term(s): Hadoop

Pages 1 2 3 4 5 

Share this link:

Doug Cutting: Why Hadoop is still No. 1
How Hadoop become the de facto standard, and what it plans on doing next Read More...

News on Monday  more>>
Android Developer News  more>>
SharePoint Tech Report  more>>
Big Data TechReport  more>>



Download Current Issue

Need Back Issues?

Want to subscribe?