Apache is more than just building Web servers: We do science too!

The Apache Software Foundation is one of the leading open-source clearinghouses responsible for the technology that empowers the Internet, like the HTTPD Web server; the technology that powers Big Data, like Hadoop; and more recently the technology that powers consumer office productivity applications, like Open Office. This is common knowledge in the tech sector. What isn’t common knowledge to the tech sector is that the ASF has grown in recent years from being solely focused on technology communities to being also focused on communities that support science. Yes, science people. Apache does science too.

Take the Apache OODT project, originating from within the walls of NASA over the last decade, and including huge staff time from NASA, other government agencies and university partners. OODT allows the general software enthusiast to manage data the same way that NASA’s next generation of remote sensing missions do, and the same way that NASA’s Planetary Data System does. (PDS is the archive for all planetary missions over the last 40 years.)

Besides NASA, OODT includes a number of contributors: from next-generation astronomical ground-based instruments like the Square Kilometre Array (which will generate over 700TB of data per second when it sees first light in 2020); from Big Data efforts in climate science; and from biomedical informatics systems at the U.S. National Cancer Institute, helping in the management of data related to the early detection of cancer in the Early Detection Research Network project.

OODT is also used at Children’s Hospital Los Angeles to collect and manage data taken from the Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit. Climate scientists, decision and policy makers, astronomers, clinicians, professors, and students use systems powered by OODT on a day-to-day basis.

In 2012, a sibling to OODT at Apache emerged in the Airavata project. Originally funded by the U.S. National Science Foundation over a number of years via the TeraGrid project, and now via the Extreme Science and Engineering Development Environment project, Airavata provides a fully baked software framework for the development of science gateways; user portals for scientists in natural, physical, Earth and astronomical sciences; high-performance computing researchers; and visualization experts to run scientific workflows, download datasets, and share those results with other researchers.
#!
Being at Apache has benefitted Airavata by encouraging its developers to look to integrate with other Apache efforts, specifically Apache Jackrabbit, a content repository, and also OODT for leverage for provenance and metadata management.

In the Apache Incubator, cTAKES is a project born out of the Strategic Health IT Advanced Research Projects program from the U.S. Department of Health and Human Services. cTAKES is a natural-language processing toolkit for making sense of free-text notes and other free-text information regularly captured in clinical environments. It can identify medications, diseases, disorders, symptoms, anatomical sites and procedures, family information, and the location and severity of a clinical condition.

More and more federal agencies within the U.S. are seeing Apache as a home for the software that they have funded over the years, software that normally would vaporize as soon as the standard 3-5 year grant would expire, and the principal investigator (PI) and his team behind the resultant software would move on to the next project. Several agencies, including NASA, DARPA, the National Science Foundation and the National Institutes of Health, encourage current grant writers and prospective PIs to include a plan for software dissemination and sustainability in their proposals.

PI-led science can see real benefits from using the Apache Software Foundation, or by embracing Apache’s principles:
• Project and committer diversity, which allows a project to go on in the face of any individual or organization pulling their support
• Meritocracy that is inclusive to participants regardless if they write code or documentation, or if they provide design guidance
• Release of software under the academic-permissive Apache License version 2 (ALv2), which doesn’t carry with it downstream “gotchas” for consumers of ALv2-licensed components
• A focus on building software communities that are lasting

If you’re in Portland, Ore., from Feb. 24-28, you can get more information about Apache’s presence in scientific projects by attending the Apache in Science track at ApacheCon. The two-day track will have both a framework and an application feel, giving you the technical nuts and bolts of the OODT and Airavata frameworks. It will then cover their use in projects ranging from bioinformatics and the EDRN project, to radio astronomy, to the next-generation Polar Orbiting Satellite missions in Earth science, to petascale climate modeling with the Earth System Grid Federation. In addition, representatives from the National Science Foundation will be on hand to discuss the agency’s strategy and policy for open source and sustainability.

Most folks think of the Web server, the Big Data projects, the user interfaces, the cloud technologies, and more recently the office productivity suites when they think of Apache, but they haven’t heard much about Apache’s strong footing in the science community. If you are currently working on scientific software and are deciding on a home for your project, Apache will gladly welcome and help grow it into a sustainable, well-recognized effort.

Chris Mattmann is a Senior Computer Scientist at NASA JPL, and also a Professor of Computer Science at the University of Southern California. He is also VP for the Apache OODT project.