News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/16/2010 to 3/19/2010
Las Vegas
Penton Media

3/17/2010 to 3/19/2010
Las Vegas
TechTarget

3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media


 
SD TIMES BLOG
ahandy

Hadoop everywhere

by Alex Handy 06/22/2009 06:12 PM EST

I've posted a few bits already on the next big thing in software development, but none of them have been as obvious or as deserved as Hadoop. Named after one of creator Doug Cutting's progeny's stuffed animals, Hadoop is, in my opinion, the killer app for the cloud. Or, at the very least, it's the infrastructure upon which the cloud's killer apps will be built.

You've all been writing systems just like Hadoop since the dawn of computing: It's the infrastructure of building massive data-crunching applications. Your nightly batches. Your monthly customer survey results. Your hourly data sift. As clustered data processing solutions go, Hadoop is a fairly painless one to use, relatively speaking. Naturally, it's a non-trivial task, at present, to process large amounts of data in Hadoop: It is only at version 0.20.0. It sounds like security is a big issue right now, and it currently takes around 20 to 30 minutes to get a crashed Name Node back up. The Name node is the single point of failure for the entire cluster. But, already, there seems to be a vibrant community of people building the ecosystem that will eventually make Hadoop a must-have platform in your data center and in your external clouds.

At its core, Hadoop is an implementation of map/reduce, coupled with a distributed file system. That means Hadoop manages the cluster; you just write the code needed for the actual data exploration. Yahoo is a big backer of the project and employs Cutting full-time. They're said to have a 4,000-node Hadoop cluster up and running, with Zookeeper acting as the sheriff when things go awry.

Data loads I have heard about, thus far, show people using Hadoop to crunch anywhere from 40 terabytes to almost a petabyte at once. That's a lot of customer data to sift through. I'm sure your business analytics people will be salivating when they start to play with Hive, Facebook's framework for accessing Hadoop data clusters via a SQL-like language.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Share this link: http://www.sdtimes.com/blog/1459

Tags: , , , , ,

cloud

Comments

Add comment


 
  Country flag

biuquote
  • Comment
  • Preview
Loading