I'm becoming more and more convinced that Hadoop is going to be its own ecosystem that I must cover as a whole beat unto itself. Call it the Map/Reduce beat, call it the big-data crunching beat. Call it the elephant with its footprints in the butter. But no matter what you call it, companies are generating more data every day, and many of them aren't deriving business intelligence from said data. And that spells opportunity.
This morning, I spoke to Chris Wensel, cofounder of Scale Unlimited. His company specializes in Hadoop training and consulting, and he offered some good insight into how the Hadoop world is expanding. He set me straight on the fact that Hadoop is already its own ecosystem, with numerous related projects making it up: Hbase, Pig, HDFS and all the other things folks have built to expand Hadoop's capabilities.
He also pointed out that there is a big gap between what Hadoop can do and what most companies need right now: That is, Hadoop is for big, slow data crunching, and many companies need smaller, faster solutions. That seems to be the expectation of super-startup Aster Data.
Another thing Wensel pointed out to me was the fact that Hadoop should live in your data center. It's all fine and dandy to put up an Amazon instance and fill it with your data, but when you're crunching a petabyte, he said, it's just too expensive. That's why Amazon lets you mail them disks. And that's also why Hadoop should be inside the firewall. Try as we might, a petabyte isn't easy to push anywhere, even inside the network.
So I'll be watching Hadoop like a hawk now that it's on my radar. We'll have lots to talk about, I'm sure, and with companies like Cloudera and semi-quiet startup Stampede, I'm sure there will continue to be innovation around the edges of the project. Of course, the core will continue to evolve too, but third-parties have a way of increasing the visibility and scope of an open-source ecosystem. Let's hope that all these companies can play well together and contribute upstream.