Print

Apache graduates Hadoop incubator projects



Alex Handy
Email
May 5, 2010 —  The Apache Incubator is getting Hadoop-heavy. The Apache Software Foundation today graduated a number of projects within that realm to its incubator. Meanwhile, at the head of the class, the Apache Traffic Server, a caching proxy server, moved up to become a top-level project at the Foundation.

“Becoming a Top-Level Project is a vote of confidence from the Foundation at large, demonstrating a project has proven its ability to be properly self-governed,” said Foundation chairman Jim Jagielski. “We are proud of our Committers' dedication in building robust communities under the ASF process known as 'The Apache Way.' ”

The new class of Apache Incubator projects are all former sub-projects of existing hatchling projects. This year's class consisted of three former Lucene sub-projects and two former Hadoop sub-projects.

From Hadoop comes Avro and HBase, two projects aimed at expanding the capabilities of the Hadoop platform.

HBase is aimed at building something similar to Google's Big Table inside of Hadoop. It gives Hadoop random read/write access to tables with potentially billions of rows. Avro is a fast data serialization system for Hadoop.

The remaining three new projects are Mahout, Nutch and Tika. Nutch is a Web search engine based on and formerly a sub-project of Lucene; Tika is used to detect data types and provide analysis thereof.

Mahout, an effort to build artificial intelligence construction tools for Hadoop, is now a separate incubator project. Mahout 0.3, which came out in March, added a number of new parallel algorithms for recognizing text and adding pattern recognition primitives to applications.

Grant Ingersoll, cofounder of Mahout and Lucene company Lucid Imagination, said that Mahout is aimed at replicating some of the basic building blocks needed to build intelligent systems based on non-structured data sets.

“Our main algorithms, now, are around clustering and categorization," he said, describing the capabilities of Mahout. "We also added in frequent pattern mining and collaborative filtering for what some people call recommendation systems. We've also got some evolutionary capabilities as well."

Earlier this year, the Apache Software Foundation created five new top-level projects, one of which is also associated with Hadoop: the Apache Unstructured Information Management Architecture (UIMA). The UIMA project was originally created by IBM and was donated to the Apache Foundation in 2006 as an incubator project. UIMA is a framework for analyzing unstructured data sets, such as natural language texts.

The Apache Cassandra project also became a top-level project in April of 2010. This NoSQL distributed database was recently updated to version 0.6.1, and gained the ability to push information stored in Cassandra into running Hadoop clusters.




Related Search Term(s): Apache, Hadoop, Spring


Share this link: http://sdt.bz/34325
 


Comments


05/07/2010 09:00:31 AM EST

Nice to see Lucene/Hadoop getting coverage. Several errors in this article: * Mahout is not from Hadoop, it's from Lucene. It uses Hadoop. * Nutch is not an incubating project. It's a Lucene sub-project that's about to become a TLP.

United StatesOtis Gospodnetic


close
NEXT ARTICLE
The Apache Software Foundation Announces Apache ACE as a Top-Level Project
Open Source OSGi software distribution framework especially suited for the Cloud and embedded computing markets Read More...
 
 
 




News on Monday  more>>
Android Developer News  more>>
SharePoint Tech Report  more>>
Big Data TechReport  more>>

   
 
 

 


Download Current Issue
MAY 2013 PDF ISSUE

Need Back Issues?
DOWNLOAD HERE

Want to subscribe?


 
 
 
 

Events calendar tab
5/21/2013 to 5/23/2013
Las Vegas
CTIA

5/28/2013 to 5/31/2013
Boston
BZ Media LLC

5/28/2013 to 5/30/2013
San Francisco
O'Reilly Media

6/2/2013 to 6/7/2013
Las Vegas
SQE

6/2/2013 to 6/6/2013
Orlando
IBM Rational