News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 2/1/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Visual Studio 2010 Release Candidate Available Today
A Visual Studio 2010 release candidate is available on MSDN.
02/09/2010 09:45 AM EST

Is Microsoft eyeing Office subscription pricing?
Microsoft may be preparing to offer a new Office pricing option called "union," which charges the same for cloud as on-premises.
02/01/2010 09:38 AM EST

Facebook rewrites PHP runtime
Facebook is about to open source its own PHP runtime, written from scratch for speed.
01/30/2010 08:53 PM EST

 

Events calendar tab
2/9/2010 to 2/13/2010
San Francisco
IDG World Expo

2/10/2010 to 2/12/2010
San Francisco
BZ Media

2/17/2010 to 2/25/2010
Atlanta
Python Software Foundation

2/19/2010 to 2/20/2010
Los Angeles
SCALE

2/21/2010 to 2/24/2010
Las Vegas
IBM


 
Most Read Latest News Blog Resources

Think Like a Customer, Use Your Stopwatch


Hands-on developers give their opinions of the various performance optimization strategies



October 15, 2004 — 
oftware performance tuning can get downright emotional. When their code isn’t working right, you might almost see programmers cry, or swear. Maybe both.

Often behind this empathy, however, is a cool and clear-eyed assessment about what it takes to tune and optimize software. Tools for profiling and debugging code can help, but many experienced developers still lean heavily on common sense and simple techniques.

One of these techniques is to stop tuning when the effort is no longer worth it. Another is to think holistically, even while delving into the minutiae of an optimization project.

As a software scientist at Adobe Systems Inc., Rich Gerber spends much time in minutiae—so much that he still regularly writes in assembly language. But his performance testing of Adobe Premier often doesn’t involve sophisticated tools or arcane knowledge.

“I just use the application the way a user would while using the simplest of all tools—a stopwatch,” said Gerber. Much of his time is spent trying out the MPEG encoder or filter, just as any video enthusiast might.

Gerber concedes that he does use some of Adobe’s own testing and instrumentation tools, as well as products such as Intel’s VTune Analyzer, to see which parts of the application are running slower than the rest.

Tuning can be as simple as turning on compiler flags and seeing what happens. “Sometimes, that gives you the performance you are looking for and you’re done,” he said.

When you still need more performance, changing the algorithm or optimizing for some of the newer processor instructions, such as single-instruction multiple data (SIMD), will boost things even more.

While working on tuning tasks, Gerber and his team of developers avoid tweaks that will be too difficult to maintain in the future. “Avoid creating unmaintainable code in the name of performance”, said Gerber. “If the original author gets hit by a bus, you want to be able to make changes.”

Focusing on Users
Musicmatch Inc.’s Randy Camp agrees that keeping a sense of proportion and balance is important. “We have to be sure that we are addressing the most used parts of our product and that we are focusing on the slowest parts, instead of the parts that are just easy to optimize,” said Camp, vice president of software research and development at the company, which Yahoo Inc. agreed to buy last month for US$160 million.

When the focus is desktop software, bottlenecks are usually easy to spot. Users expect a high level of performance, and sluggish response times are obvious to anyone clicking around in the program. Even though desktop software vendors are managing larger data sets and offering more features, ever-faster PCs can more than keep up.

On the server side it’s a bit trickier. Code has to be tuned in the lab based on best guesses. Once deployed, the application’s performance must be continually monitored since the server load is not always predictable.

At Musicmatch, this monitoring is a pillar of the company’s ongoing optimization efforts. “If our Web site or Web services that are used by the desktop products are sluggish, all of a sudden our users have a bad experience,” said Camp.

Sluggish code and surly customers can be costly. So when faced with a bottleneck, the first step for Musicmatch is to localize it. Having an application that’s well-factored and decomposable can make it easier to track down the subsystem causing the slow performance. The less code to instrument and profile, the better, said Camp.

Next, Musicmatch measures the existing performance of the target subsystem to create a reference so its developers know how much they’re really improving things when they make changes later. “A unit test or other exerciser code is very useful here,” said Camp. “We want to be sure that we are measuring exactly the same operation every time.”

Finally, Musicmatch coders use a profiling tool to measure in detail where the time is being spent in the problem subsystem. Based on what they see there, they take one of two actions.

“If there’s an outlier—a function or other segment of code that’s taking a lot more time than the code around it—we attack that particular segment,” said Camp. “If, on the other hand, none of the code in the subsystem appears to be out of line, then we need to scratch our heads and ask ourselves if there’s a way we could redesign the whole subsystem so that it will perform better.”

Staying Responsive
Looking at wholesale redesign of underlying subsystems is a first step for some developers. Roy Goncalves, chief technology officer of Canada-based Info Touch Technologies, which provides Internet kiosk security and management software, said algorithmic optimization is now the centerpiece of his tuning efforts.

Algorithmic optimization looks at altogether new ways to accomplish a task. This is a sharp contrast to making incremental improvements to existing code—what optimization means to most people.

For example, in a distributed application you may spend hundreds of hours to increase performance and throughput by 20 percent by optimizing code. Or you could spend a quarter of that time implementing a concept like data caching and increase performance by 200 percent.

Info Touch has deployed many large networks of public Internet kiosks that are in constant communication with a back-end data warehouse system. Although the company has many core areas of its online reporting system, including financial reports, status reports, bill pay application usage, and accounting integration, the biggest bottleneck is handling the roughly 50 million messages a day from its kiosks.

Beyond just handling massive amounts of data, Info Touch’s messaging system had a number of requirements—the ability to travel through any firewall, easy extensibility to add additional message types and functionality, and guaranteed message delivery and acknowledgement. A traditional Web services architecture using XML worked, but it proved to be a costly solution as the company grew.

XML is flexible, “but the cost of [XML] for our particular solution is a format that adds a lot of additional baggage, both in file size and processing time,” said Goncalves. “We could have tried to optimize [an XML] solution at a code level, but it was pretty clear that the specifications that make XML so powerful would also hinder our ability to create the system that we needed.”

Because Info Touch’s kiosk software and servers know exactly what type of information is being passed back and forth, the company instead created its own messaging specification and associated management tools.

Goncalves said that the system has cut the amount of data transferred by 90 percent, lowered the processing time of this data by adding more intelligence to both the kiosk and the server, and ensured that this intelligence is shared so that the front and back ends more efficiently communicate. “The result of this algorithmic optimization is a system that keeps hardware and operational costs low, helping to ensure a quick return on investment when expanding our kiosk networks,” he said.

Finding the Problems
Knowing where to find bottlenecks and concentrate higher-level optimization efforts is a skill that separates the good programmers and architects from the great ones, Goncalves said. In a distributed system, looking in the right place requires a fairly deep understanding of all the relevant technologies, including database performance, business layer systems, user interface and the usage patterns of customers. This task is made much more complex with modern multi-tiered distributed systems, which have many dependencies, and also many choke points that link different parts of a system together.

Many modern testing tools are able to cope with complex environments — but sometimes the number of links in an application’s logic or data chain may proliferate faster than any tool can cope. Consider the case of Philippe Lantin, system architect for The Cobalt Group, an application service provider that caters to car dealerships.

Lantin manages a well-tuned, three-tier application stack. Sandwiched between Cobalt’s presentation layer for static content, such as images, and its back-end Oracle database, is a middle tier to serve and sort dynamic content. This mid-tier runs on the BEA’s WebLogic platform and takes full advantage of Enterprise Java Beans, Java Server Pages and servlets. And it utilizes a persistence layer for maximum performance.

Despite these technologies, the mid-tier is one of Lantin’s big pain points, and the reasons are mostly outside of his control. It’s the place in the application stack that hooks into a slew of external resources, from major automobile manufacturers to direct marketing firms.

“If I don’t take these external resources into account, I can end up having cascading performance effects in the application,” said Lantin. He spends a lot of time fiddling with timeouts and monitoring negotiated service level agreements. Hardly extreme coding, but it is extremely effective in managing the performance of Cobalt’s hosted services.

Managing performance of a development team means more than tips, tricks and tools for optimization. The best coders also know when to terminate their tuning efforts.

Adobe’s Gerber addresses when to stop tuning in his 2002 book Software Optimization Cookbook: High-Performance Recipes for the Intel Architecture, published by Intel Press.

The important idea is to know how close you are to your code’s theoretical maximum performance—what Gerber calls the speed of light.

“Just as we know that rocket ships cannot travel faster than the speed of light, Star Trek excluded,” said Gerber, “developers can take knowledge of today’s processor and memory technologies to roughly calculate maximum performance.”

Suppose you want to find this maximum for a video filter that loaded a frame, did a complex math operation, and then stored the frame. First, strip the algorithm down to the basics—load a frame’s worth of data, execute a divide or some simple but representative operation, and then store the data. Next, ask “if this was as perfect as possible, how fast would it be?” said Gerber.

One caveat: before relying on your speed-of-light calculation, make sure that the compiler is doing what you expect it to be doing. Sometimes, the compiler optimizes its way to shortcuts, changing what you are really trying to test.

Scratching out these calculations involves nothing more than a bit of core math. If the speed of light isn’t fast enough, then change the algorithm.

Gerber said one such calculation recently helped him think through optimizations to minimize feathering of an image’s edges in Premier’s rotation function. The underlying in-place operation, source A over source B, was 100 times slower than its speed of light. So there was lots of room for performance improvement.

“On the other hand, if you’re already 99 percent of the way there, it is probably time to work on something else,” said Gerber.

Confronted with a hypothetical choice of his own, Info Touch’s Goncalves will pull the plug on code-level tuning far sooner than on algorithmic optimization. He said that with the exception of games and real-time systems, optimizing code quickly leads to diminishing returns. “Each incremental improvement will require more effort and thought than the previous one, with the result often being wasted effort,” said Goncalves.

Since attempts at algorithmic optimizations are more likely to yield exponential performance boosts, cost savings, or even new products, Goncalves said he lets the clock tick longer on these projects.

Christopher Seiwald, founder of software configuration tools vendor Perforce Software Inc., takes that philosophy a step further: He simply keeps tuning until the familiar tick-tock-trickle of customer complaints fades away. Most software, he says, “is still just released onto the population and then revved until these complaints go away.”

And if you’re not responding to complaints, but rather addressing an unwieldy piece of unimproved code left over from version 1.0, Seiwald suggests relying on the so-called un-optimized rule of thumb—stop tuning when you’ve doubled your performance.

Decisions about when to start and stop tuning change as the computing environment changes. It wasn’t long ago that people wrote programs to run on single machines. “There were standard operating systems and tools—the tools produced a simple little report that showed where time was being spent—that everybody used,” said Alex Aiken, a Stanford University computer science professor.

Today, the overarching trend is toward networked, distributed applications. Trying to improve performance the old-fashioned way, by tuning individual applications or machines in a networked chain, can be a mixed bag.

“Monitoring distributed parallel systems is still largely a black art,” said Aiken, whose research interests include tools for detecting errors and checking software specifications, and static program analysis. “While there is active research on the topic, in practice people are still really rolling their own infrastructure.”

Building a Solution
The problem with rolling your own, with undertaking complex integration tasks, is that performance degrades with each boundary crossing in the computing environment. It’s a problem that’s piqued the interest of Mark Wegman, chief technology officer at IBM Research.

In the late 1960s and early 1970s, computing environments were much simpler and programmers worked close to the machine. “In these situations, it was reasonable for a human being to select the right algorithms,” said Wegman, co-inventor of the flow analysis algorithms used in most modern optimizing compilers and of the GIF format. “Today, there are so many more boundaries. And people just don’t bother to optimize across all of them.”

Programmers and architects often use software components from “elsewhere” to solve problems. The component writer didn’t know the exact context in which his code would be used. And the component’s algorithms are hidden from the user—that’s what information hiding is about.

To bridge this disconnect, “suppose software components could be written with several different algorithms for completing a given task,” said Wegman. “Somewhere—maybe in the virtual machine, operating system or compiler—the right algorithm would be selected automatically for the environment.”


Share this link: http://www.sdtimes.com/link/28190
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading