News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 2/1/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Visual Studio 2010 Release Candidate Available Today
A Visual Studio 2010 release candidate is available on MSDN.
02/09/2010 09:45 AM EST

Is Microsoft eyeing Office subscription pricing?
Microsoft may be preparing to offer a new Office pricing option called "union," which charges the same for cloud as on-premises.
02/01/2010 09:38 AM EST

Facebook rewrites PHP runtime
Facebook is about to open source its own PHP runtime, written from scratch for speed.
01/30/2010 08:53 PM EST

 

Events calendar tab
2/9/2010 to 2/13/2010
San Francisco
IDG World Expo

2/10/2010 to 2/12/2010
San Francisco
BZ Media

2/17/2010 to 2/25/2010
Atlanta
Python Software Foundation

2/19/2010 to 2/20/2010
Los Angeles
SCALE

2/21/2010 to 2/24/2010
Las Vegas
IBM


 
SD TIMES BLOG
ahandy

Hadoop everywhere

by Alex Handy 06/22/2009 06:12 PM EST

I've posted a few bits already on the next big thing in software development, but none of them have been as obvious or as deserved as Hadoop. Named after one of creator Doug Cutting's progeny's stuffed animals, Hadoop is, in my opinion, the killer app for the cloud. Or, at the very least, it's the infrastructure upon which the cloud's killer apps will be built.

You've all been writing systems just like Hadoop since the dawn of computing: It's the infrastructure of building massive data-crunching applications. Your nightly batches. Your monthly customer survey results. Your hourly data sift. As clustered data processing solutions go, Hadoop is a fairly painless one to use, relatively speaking. Naturally, it's a non-trivial task, at present, to process large amounts of data in Hadoop: It is only at version 0.20.0. It sounds like security is a big issue right now, and it currently takes around 20 to 30 minutes to get a crashed Name Node back up. The Name node is the single point of failure for the entire cluster. But, already, there seems to be a vibrant community of people building the ecosystem that will eventually make Hadoop a must-have platform in your data center and in your external clouds.

At its core, Hadoop is an implementation of map/reduce, coupled with a distributed file system. That means Hadoop manages the cluster; you just write the code needed for the actual data exploration. Yahoo is a big backer of the project and employs Cutting full-time. They're said to have a 4,000-node Hadoop cluster up and running, with Zookeeper acting as the sheriff when things go awry.

Data loads I have heard about, thus far, show people using Hadoop to crunch anywhere from 40 terabytes to almost a petabyte at once. That's a lot of customer data to sift through. I'm sure your business analytics people will be salivating when they start to play with Hive, Facebook's framework for accessing Hadoop data clusters via a SQL-like language.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Share this link: http://www.sdtimes.com/blog/1459

Tags: , , , , ,

cloud

Comments

Add comment


 
  Country flag

biuquote
  • Comment
  • Preview
Loading