News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 7/1/2009 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
A knockout blow for Borland?
MicroFocus has upped its offer for Borland Software to $1.50, hoping to chase off a mystery suitor also pursuing the ALM vendor.
07/06/2009 12:26 PM EST

Is the mystery Borland suitor Serena?
Borland software is considering an offer from another company after a preliminary deal with MicroFocus. Is Serena the new company?
06/30/2009 01:55 PM EST

Windows 7 - An eBayer's dream product?
Windows 7 pre-orders can make people money on eBay.
06/29/2009 03:48 PM EST

 

Microsoft Worldwide Partner Conf.
7/13/2009 to 7/16/2009
New Orleans
Microsoft

OSCON (Open Source Convention)
7/20/2009 to 7/24/2009
San Jose
O'Reilly Media

XBRL Technology Workshop & Summit
7/28/2009 to 7/30/2009
Santa Clara
XBRL US

ACM SIGGRAPH
8/3/2009 to 8/7/2009
New Orleans
ACM SIGGRAPH

OpenSource World (formerly LinuxWorld)
8/12/2009 to 8/13/2009
San Francisco
IDG World Expo


 
Most Read Latest News Blog Resources

Performance Testing for Multicore Systems




July 3, 2007 — 
Are your company's applications ready to run on multiple cores? It's likely that your company already has multicore desktop or laptop machines deployed, though perhaps not yet running at full capability. Will they work faster or run at all? Do you even have a means of determining that?

Before attempting to test and benchmark a multicore–enabled application, you'll need to understand some basic concepts for building scalable applications. For that, I called on Jim Falgout, an expert on application parallelism and solutions architect at Pervasive Software, which makes embeddable integration, performance and security solutions.

Decreasing 'Wall Clock Time'
"With multiple-processor cores, the obvious thing we want an application to do is to take advantage of all of the cores available to decrease its 'wall clock time,'" he says. Wall clock time is the total running time of an application from start to finish. "An application that scales well will have a decreasing wall clock time as the numbers of resources (cores) are added to the system under test."

One basic way of doing this, Falgout says, is to overlap application I/O with application compute cycles. "This can be done using simple double buffering," also known as a producer-consumer technique. The producer (disk reader) and consumer (compute node) each run in their own process or thread. This method requires a means of communicating data between the two threads and enough buffer space to ensure that one thread doesn't starve the other.

Another technique of overlapping I/O and compute is to use a pipelined architecture. "With this architecture, all functions of the application are split into nodes, including I/O and compute functions. Each function is linked to the other using a data queue," Falgout explains. This may be an in-memory queue for functions that live within the same process, a shared memory queue for inter-process communication or a network socket for inter-machine hook ups, he says. "A pipelined architecture is flexible in that it allows functions to be wired together in different ways to form complex applications. It can also produce highly scalable applications since each function within the application runs in its own thread or process." Pipelining can add scalability to applications that involve algorithms that may not scalable otherwise.

Data Partitioning: Choose Your Method
Another technique is data partitioning, of which there are two methods. The first is horizontal partitioning, which uses a divide and conquer principle. "The data is segmented into a number of partitions. Each partition instantiates a copy of the functions to apply to the data." The number of partitions to use may be based on the number of cores available (dynamic) or on some known division of the data (static). "This technique allows a possibly costly calculation to be spread across many cores."

The second partitioning method is vertical partitioning, which allows row set data to be split into individual columns for separate processing. This boosts performance by allowing only needed data to be moved through the application. It also provides scalability by allowing different threads to control each data column, but must be used wisely, says Falgout. "Input datasets for an application may contain many hundreds of columns. Overuse of vertical partitioning can lead to too many threads being created, stressing the underlying operating system or virtual machine."

Hot Spots and Hot Locks
Algorithmic tuning is another common technique for decreasing an application's runtime. The traditional way of tuning algorithms can involve the use of a profiler to determine algorithm "hot spots," or sections of code that run the slowest. "The programmer then attacks that code, looking for ways to get better performance out of the algorithm."

This technique is still applicable on multicore systems, Falgout says, but the methods for tuning algorithms have changed. "To get scalability out of an algorithm, it must be chopped in some way to allow multiple threads of control to implement the algorithm in parallel. Tuning the algorithm now relies on multithreaded tuning techniques such as looking for a 'hot lock' or a lock that may be too granular, causing the algorithm to lose scalability."

Endless variations on these techniques and others can be used to create scalable, high-performance applications. The ones discussed here are some of the first an application writer will likely use to produce scalable data-processing applications.

You've got one week to absorb and apply these techniques. Next Tuesday, I'll share Falgout's techniques for measuring application performance improvements when optimized for scalability and multicore systems.


Share this link: http://www.sdtimes.com/link/30874
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading