Hive, Pig fight for Hadoop supremacy



Email    print   
July 27, 2009 —  (Page 1 of 2)
Facebook's internal 600-node Hadoop cluster was confounding its users. While Hadoop served up terabytes of data for business intelligence, the language designed to write queries for that data, Pig, was causing some slowdowns due to the training needed to bring business intelligence users up to speed. That's why the Facebook development team decided to write Hive.

“Moving from Hadoop to Hive opened up a lot more options for us,” said Bobby Johnson, director of engineering at Facebook. “There wasn't this steep learning curve; that was really what kicked it off. We had started with Hadoop and we thought this was really powerful, but we thought we had to get that power into the hands of people with less friction.”

Ashish Thusoo, engineering manager at Facebook, said that Hive allows for more flexibility than Pig, and the fact that its syntax is based on SQL makes it more accessible to traditional business users.

“Hive is very extensible," he said. "You can plug in your own scripts and your own processing logic into the workflow. You can manage different kinds of data formats. It's a very extensible and flexible system."

The Hadoop team has taken notice of Hive's newfound success: Eric Baldeschwieler, vice president of grid computing at Yahoo, leads the team there that works on Hadoop and Pig. He admitted that Pig now accounts for around half of the language layers used on top of Hadoop. Currently, Pig does not behave like SQL, and Baldeschwieler said that his team understands that this is a sticking point.

“It's clear that being able to support SQL will increase the accessibility of Pig,” said Baldeschwieler. “We are working on a SQL API to Pig. We're also working on calmer storage, which will improve the IO performance of Pig and SQL.”

The biggest difference between Pig and SQL, said Baldeschwieler in explaining why Pig didn't take its initial cues from the standard database query language, is that “SQL is a declarative language, where you describe the output you want. Pig is a procedural language, where you describe the steps you would like performed. There is a constituency that likes that more. It also has the same basic Boolean logic you find in SQL, as well as tables and joins…But the way you manipulate those objects is more procedural."



Related Search Term(s): Hadoop

Pages 1 2 


Share this link: http://sdt.bz/33640
 
Most Read Latest News Blog Resources

Add comment


Name*
Email*  
Country     


  • Comment
Loading




close
NEXT ARTICLE
Hadoop hits milestone 1.0 release
HBase and cluster managements tools are highlights of the new offering Read More...
 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 

Download Current Issue
FEBRUARY 2012 PDF ISSUE

Need Back Issues?
DOWNLOAD HERE

Want to subscribe?


 
blogs tab
Agility, mom, and apple pie
If we're to evaluate the state-of-the-art in software development, we should start with the values espoused in the Agile Manifesto.
02/07/2012 11:57 AM EST

RIM woos developers with free tablet
How do you get more apps ported to the BlackBerry PlayBook? By giving every developer a free tablet, of course!
02/04/2012 01:57 PM EST

GitHire: Use Headhunters to Find Your Perfect Programmer
Are you a hiring manager tired of scouring the job boards? Check out this new service that will find 5 people interested in your jobs.
02/03/2012 12:17 PM EST

Facebook claims hacker cred
Facebook's SEC S-1 filing form includes a short essay on the Hacker Way by Mark Zuckerberg himself.
02/02/2012 08:26 AM EST

Ryan Dahl steps down
Ryan Dahl, creator of Node.js, steps back from his position as gatekeeper for the project.
02/01/2012 04:58 PM EST

Bloomberg opens its API
Bloomberg's APIs could lead to a future standard for accessing market data.
02/01/2012 04:41 PM EST

 
Events calendar tab
2/13/2012 to 2/16/2012
Santa Clara
TechWeb

2/26/2012 to 2/29/2012
San Francisco
BZ Media

2/27/2012 to 3/2/2012
San Francisco
RSA

3/4/2012 to 3/7/2012
Las Vegas
IBM Tivoli

3/5/2012 to 3/9/2012
San Francisco
TechWeb