CHANNELS
HOME
TOP STORIES
COLUMNS
OPINIONS
ZEICHICK'S TAKE
EMBEDDED NEWS
TEST & QA REPORT
ECLIPSESOURCE
SPECIAL REPORTS
SD TIMES 100
JOB BOARD
EVENTS CALENDAR
RESOURCE CENTER
WEBINAR CENTER
ADVANCED SEARCH
RSS
ON THE WEB
SITE MAP
ADVERTISE
EDITORIAL
PRIVACY POLICY
CONTACT US
REPORT A BUG
PRINT EDITION
SUBSCRIBE NOW!
CURRENT ISSUE
BACK ISSUES
SUBSCRIBER SERVICES
BZ MEDIA
ABOUT US
NEWS
BZ RESEARCH
SYSMANNEWS
ST&P MAGAZINE
STPCON
ECLIPSEWORLD
ADVERTISER LINKS
activePDF
Alexsys
Altova
Amyuni Technologies
Automated QA
Axosoft
Business Objects
Codejock Software
ComponentOne
Coverity
Data Dynamics
Developer Express
dtSearch
Dundas
Dynamsoft
Hewlett-Packard
IBM
Imagix
Infragistics
InstallAware Software
InterSystems
iWay
Kovair
LEAD Technologies
McObject
Microsoft
MKS
No Magic
nsoftware
Parasoft
Pegasus Imaging Corp
Perforce
Prezza Technologies
Programmer's Paradise
Programming Research
Rally Software Dev
Red Gate Software
ScaleOut
Seapine
Serena
Software FX
Sparx Systems
Swell Software
Syncfusion
TechExcel
Telerik
UrbanCode
WANdisco
Xceed Software
LOADING...
LOADING...
AS OF 8/7/2008 4:12PM EST
Performance Testing for Multicore Systems
By
Edward J. Correia
July 3, 2007 —
Are your company's applications ready to run on multiple cores? It's likely that your company already has multicore desktop or laptop machines deployed, though perhaps not yet running at full capability. Will they work faster or run at all? Do you even have a means of determining that?
Before attempting to test and benchmark a multicoreenabled application, you'll need to understand some basic concepts for building scalable applications. For that, I called on Jim Falgout, an expert on application parallelism and solutions architect at Pervasive Software, which makes embeddable integration, performance and security solutions.
Decreasing 'Wall Clock Time'
"With multiple-processor cores, the obvious thing we want an application to do is to take advantage of all of the cores available to decrease its 'wall clock time,'" he says. Wall clock time is the total running time of an application from start to finish. "An application that scales well will have a decreasing wall clock time as the numbers of resources (cores) are added to the system under test."
One basic way of doing this, Falgout says, is to overlap application I/O with application compute cycles. "This can be done using simple double buffering," also known as a producer-consumer technique. The producer (disk reader) and consumer (compute node) each run in their own process or thread. This method requires a means of communicating data between the two threads and enough buffer space to ensure that one thread doesn't starve the other.
Another technique of overlapping I/O and compute is to use a pipelined architecture. "With this architecture, all functions of the application are split into nodes, including I/O and compute functions. Each function is linked to the other using a data queue," Falgout explains. This may be an in-memory queue for functions that live within the same process, a shared memory queue for inter-process communication or a network socket for inter-machine hook ups, he says. "A pipelined architecture is flexible in that it allows functions to be wired together in different ways to form complex applications. It can also produce highly scalable applications since each function within the application runs in its own thread or process." Pipelining can add scalability to applications that involve algorithms that may not scalable otherwise.
Data Partitioning: Choose Your Method
Another technique is data partitioning, of which there are two methods. The first is horizontal partitioning, which uses a divide and conquer principle. "The data is segmented into a number of partitions. Each partition instantiates a copy of the functions to apply to the data." The number of partitions to use may be based on the number of cores available (dynamic) or on some known division of the data (static). "This technique allows a possibly costly calculation to be spread across many cores."
The second partitioning method is vertical partitioning, which allows row set data to be split into individual columns for separate processing. This boosts performance by allowing only needed data to be moved through the application. It also provides scalability by allowing different threads to control each data column, but must be used wisely, says Falgout. "Input datasets for an application may contain many hundreds of columns. Overuse of vertical partitioning can lead to too many threads being created, stressing the underlying operating system or virtual machine."
Hot Spots and Hot Locks
Algorithmic tuning is another common technique for decreasing an application's runtime. The traditional way of tuning algorithms can involve the use of a profiler to determine algorithm "hot spots," or sections of code that run the slowest. "The programmer then attacks that code, looking for ways to get better performance out of the algorithm."
This technique is still applicable on multicore systems, Falgout says, but the methods for tuning algorithms have changed. "To get scalability out of an algorithm, it must be chopped in some way to allow multiple threads of control to implement the algorithm in parallel. Tuning the algorithm now relies on multithreaded tuning techniques such as looking for a 'hot lock' or a lock that may be too granular, causing the algorithm to lose scalability."
Endless variations on these techniques and others can be used to create scalable, high-performance applications. The ones discussed here are some of the first an application writer will likely use to produce scalable data-processing applications.
You've got one week to absorb and apply these techniques. Next Tuesday, I'll share Falgout's techniques for measuring application performance improvements when optimized for scalability and multicore systems.
EMAIL THIS ARTICLE
SEND FEEDBACK
MORE TEST & QA
 
SUBSCRIBE TODAY!
E-Newsletters:
News on Mon/Thurs.
Test & QA Report
EclipseSource
SUBMIT
 
JOB BOARD
PDF & PRINT EDITION
* Requires Resource Account! 
LOGIN
or
SIGN UP
*
Download Current Issue!
ISSUE 8/1/2008 PDF
*
Need Back Issues?
DOWNLOAD HERE
Receive The Print Edition?
SUBSCRIBE HERE
 
EVENTS CALENDAR
SHARE 2008
8/10/2008 to 8/15/2008
San Jose
SHARE
ACM SIGGRAPH
8/11/2008 to 8/15/2008
Los Angeles
ACM SIGGRAPH
Intel Developer Forum
8/19/2008 to 8/21/2008
San Francisco
Intel
Business of Software 2008
9/3/2008 to 9/4/2008
Boston
Red Gate Software
VSLive New York
9/7/2008 to 9/10/2008
New York City
1105 Media
REGISTER
MORE EVENTS
GET NOTIFIED!
About all of the latest Resources
SD TIMES 100
6th Annual SD Times 100
It's time once again to
recognize the organizations
or individuals that have
demonstrated leadership in
their markets.