Apache graduates Hadoop incubator projects
May 5, 2010 —
The Apache Incubator is getting Hadoop-heavy. The Apache Software Foundation today graduated a number of projects within that realm to its incubator. Meanwhile, at the head of the class, the Apache Traffic Server, a caching proxy server, moved up to become a top-level project at the Foundation.
“Becoming a Top-Level Project is a vote of confidence from the Foundation at large, demonstrating a project has proven its ability to be properly self-governed,” said Foundation chairman Jim Jagielski. “We are proud of our Committers' dedication in building robust communities under the ASF process known as 'The Apache Way.' ”
The new class of Apache Incubator projects are all former sub-projects of existing hatchling projects. This year's class consisted of three former Lucene sub-projects and two former Hadoop sub-projects.
From Hadoop comes Avro and HBase, two projects aimed at expanding the capabilities of the Hadoop platform.
HBase is aimed at building something similar to Google's Big Table inside of Hadoop. It gives Hadoop random read/write access to tables with potentially billions of rows. Avro is a fast data serialization system for Hadoop.
The remaining three new projects are Mahout, Nutch and Tika. Nutch is a Web search engine based on and formerly a sub-project of Lucene; Tika is used to detect data types and provide analysis thereof.
Mahout, an effort to build artificial intelligence construction tools
for Hadoop, is now a separate incubator project. Mahout 0.3, which came
out in March, added a number of new parallel algorithms for recognizing
text and adding pattern recognition primitives to applications.
Ingersoll, cofounder of Mahout and Lucene company Lucid Imagination,
said that Mahout is aimed at replicating some of the basic building
blocks needed to build intelligent systems based on non-structured data
“Our main algorithms, now, are around clustering and
categorization," he said, describing the capabilities of Mahout. "We
also added in frequent pattern mining and collaborative filtering for
what some people call recommendation systems. We've also got some
evolutionary capabilities as well."
Earlier this year, the Apache Software Foundation created five new top-level projects, one of which is also associated with Hadoop: the Apache Unstructured Information Management Architecture (UIMA). The UIMA project was originally created by IBM and was donated to the Apache Foundation in 2006 as an incubator project. UIMA is a framework for analyzing unstructured data sets, such as natural language texts.
The Apache Cassandra project also became a top-level project in April of 2010. This NoSQL distributed database was recently updated to version 0.6.1, and gained the ability to push information stored in Cassandra into running Hadoop clusters.
Related Search Term(s): Apache, Hadoop, Spring