Print

RHIPE combines Hadoop and the R analytics language



Alex Handy
Email
October 11, 2010 —  For Revolution Analytics, many of its largest customers in finance and retail began asking for the ability to use analytics programming language R with Hadoop. Fortunately for the company, one developer had been building this exact piece of software for over a year. RHIPE is that project, and at Hadoop World in New York, Revolution Analytics was on hand to state that they hired the man behind RHIPE.

Saptarshi Guha is now a consultant at Revolution Analytics, and he said that the RHIPE project is already quite usable, though it is only at version 0.63.

“I actually started developing RHIPE in the last few years of my Ph.D.," he said. "A lot of my colleagues had data sets that were getting increasingly bigger. They were hard-pressed to compute with this data across the cluster. I saw Hadoop and I saw R, which all data managers need to use, and the two were disjointed."

Thus, he began working on RHIPE. Guha said that his first goal was to give R users access to the same functionality and capabilities that Java developers had in Hadoop. He said that this goal is now mostly met by the project, so he has made it available for free.

When asked why he felt that R was a good match with Hadoop, Guha said: “R has about 2,700 packages that bring every possible statistical algorithm and tool to the user. To apply algorithms to different subsets of data, there's no other language choice but R. The idea is to push R to the computing back end, rather than send the data to R."

To that end, RHIPE takes the form of a single R package that must then be installed on each Hadoop node. Once it's up and running, Guha said it will provide two types of workflows for developers.

The first allows developers to interact with the Hadoop data using R and a command line-like interface. This allows developers to manipulate the data in near real time, and he said this enables R developers to crank out data visualizations in minutes. The second workflow is similar to the traditional Hadoop workflow: Developers write jobs, upload them to the cluster, then return when the jobs are done.

Mike Minelli, VP of sales at Revolution Analytics, said that Hadoop will now be a focus for R. “[Guha] is now working with us closely. We hired him because Revolution Analytics will be working to basically marry up Hadoop and Scale R, which is a component that makes R scream as far as speed and the ability to handle large sets of data.

"The next generation is to marry those two components, so those computations happen fast in R in Hadoop."




Related Search Term(s): Hadoop, RHIPE


Share this link: http://sdt.bz/34792
 
Most Read Latest News Blog Resources

Add comment


Name*
Email*  
Country     


  • Comment
Loading




close
NEXT ARTICLE
Hadoop hits milestone 1.0 release
HBase and cluster managements tools are highlights of the new offering Read More...
 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 

Download Current Issue
MAY 2012 PDF ISSUE

Need Back Issues?
DOWNLOAD HERE

Want to subscribe?


 
blogs tab
Creation
To write better software, cultivate your ability to be creative.
05/19/2012 07:40 PM EST

Slick...but who needs it?
compilr.com is a well-designed site and the folks behind it seem to have their heart in the right place. But...who needs it?
05/16/2012 12:45 PM EST

How to be a better software developer
Want to be a better developer? You won't get there by mastering an interesting language or learning a new set of APIs.
05/14/2012 12:18 PM EST

Wooing Galatea
Do yourself a favor and check out Galatea 2.2, a wonderful book by novelist Richard Powers.
05/12/2012 07:05 PM EST

The world as story
An artificial-intelligence system at Carnegie Mellon seeks to understand the world by making statements about it.
05/10/2012 06:39 AM EST

The Rise of the Brogrammer, or the Rise of the Sexist Programmer?
Women in Silicon Valley get vocal about sexist ads and campaigns that contribute to a tense work environment.
05/09/2012 03:14 PM EST

 

Events calendar tab
5/23/2012 to 5/24/2012
Chicago
IEG

6/3/2012 to 6/7/2012
Orlando
IBM Rational

6/10/2012 to 6/15/2012
Las Vegas
SQE

6/10/2012 to 6/15/2012
Las Vegas
SQE

6/11/2012 to 6/14/2012
Bellevue, Wash.
AMD