News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 2/1/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Visual Studio 2010 Release Candidate Available Today
A Visual Studio 2010 release candidate is available on MSDN.
02/09/2010 09:45 AM EST

Is Microsoft eyeing Office subscription pricing?
Microsoft may be preparing to offer a new Office pricing option called "union," which charges the same for cloud as on-premises.
02/01/2010 09:38 AM EST

Facebook rewrites PHP runtime
Facebook is about to open source its own PHP runtime, written from scratch for speed.
01/30/2010 08:53 PM EST

 

Events calendar tab
2/9/2010 to 2/13/2010
San Francisco
IDG World Expo

2/10/2010 to 2/12/2010
San Francisco
BZ Media

2/17/2010 to 2/25/2010
Atlanta
Python Software Foundation

2/19/2010 to 2/20/2010
Los Angeles
SCALE

2/21/2010 to 2/24/2010
Las Vegas
IBM


 
Most Read Latest News Blog Resources

IBM’s M2 corrals massive data sets with Hadoop




October 2, 2009 — 
With 1,386 members making up the two houses of the Parliament of the United Kingdom, there is certainly no shortage of government data flowing from the territories of Great Britain and Northern Ireland. Bills must be voted upon, elections must be carried out, and many other actions must and tracked.

That is one of the reasons why IBM created M2, an enterprise data analysis platform. M2, announced today at Hadoop World in New York, aims to help organizations better gather important government and business data. It was built using Apache Hadoop, an open-source Java framework that enables applications to work with large sets of data.

M2 is IBM’s latest Web 2.0 technology, joining the ranks of the Mashup Center mashup platform and WebSphere sMash Web application development environment.

Rod Smith, vice president of IBM’s emerging technologies unit, said M2 is different from other data analyzers because it is flexible and able to scale to large data sets. It can also integrate with other visualization and analytic engines, such as IBM’s Cognos business intelligence software.

Smith said customers spoke about how they didn’t know how to harvest vast amounts of data properly for business intelligence and analytics. “We scratched our heads about it for a while, and then when the Hadoop project got started up, it looked like a good foundation to build on where we could explore the idea of doing do-it-yourself analytics,” he said.

“It’s about deeper intelligence that’s more exploratory than what you’d think about from a data warehouse.”

In a demo with SD Times, IBM showed a BBC data mashup called “Digital Democracy,” which sifts through government-published data and makes that information easier to access for BBC journalists. The mashup can show which members of Parliament are working on what bills, as well as voting records, demographic trends and many other data points.

M2 has a spider that crawls the Internet to retrieve content, but content can come from other sources, such as internal databases. In the case of the “Digital Democracy” mashup, the spider collected a few million pages of content over four days, according to Stewart Nickolas, a distinguished engineer for IBM’s emerging technologies unit. For a crawl, a user will identify URLs he or she would like to begin with and how vast a search they want to conduct.

Content is presented in a spreadsheet-like format. When information is brought back from a crawl, a user can analyze the data by saving it as a new collection and introduce a text extractor into the collection. An extractor can categorize information as people, places and things from an arbitrary piece of content, Nickolas said. A filtering capability within the extractor allows the user to focus on information that is relevant to them.

“We’re pluggable up and down the stack from an architecture perspective, so you can plug in many different text analytics or extraction capabilities,” Nickolas said. “You can create a flow of successive transformations to very large sets of unstructured data.”

M2 uses Hadoop’s Pig data analysis platform behind the scenes to run dataflow transformations. There is a “combine” feature that can bring together information from different flows created by the user to link up categories from different lists. A pivot table can further group information into columns in order to show a specific data characteristic, such as how many times a Parliament member voted “yes” on a bill, Nickolas said.

Other features of M2 include a formula editor that can count up particular statistics on a table, as well as a bubble chart for displaying information.

Data can be exported in formats such as CSV, JSON and RSS feeds. Nickolas noted that while some users might need to export terabytes worth of data, a key element of Hadoop is “to bring computation to data because it’s so large.”

Smith said IBM will continue to evolve M2 in order to further improve its data-harvesting capabilities. “We’re getting closer to having this high-level, spreadsheet-like tooling so that domain experts don’t have to send data off to external organizations or centralized [business intelligence] type of groups.”


Related Search Term(s): HadoopIBM


Share this link: http://www.sdtimes.com/link/33808
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading