Most Read Latest News Blog Resources

Concurrency: The Compiler Writer’s Perspective


Google’s Brian Grant discusses the problems of parallelism



November 15, 2007 — 
It’s rare to find concurrency practitioners with the right mix of technical insight and commercial software success. Brian Grant is a former software architect at PeakStream, which was acquired in June by Google. Though he’s tight-lipped about his current work, his concurrency background extends beyond PeakStream to include work at Lawrence Livermore National Labs on massively parallel machines.

PeakStream grew out of the Brook stream programming language for graphics processing units (GPUs), which originated at Stanford University. PeakStream’s suite of APIs, a virtual machine, a system profiler and a just-in-time compiler enables multithreaded applications to run on a panoply of hardware (multicore x86 processors, CPUs enhanced with GPUs, and possibly IBM's Cell). Before the acquisition by Google, PeakStream had aimed its multicore programming product at industries with obvious interest in high-performance computing: oil and gas, defense, finance and academia.

All this gives Grant a unique view on what makes concurrency hard, and what will—or won’t—make it easier. Though our conversation grazed only the surface of these issues, it was useful to gain a compiler writer’s perspective on the problems of parallelism.

SD Times: Is concurrency too difficult for developers accustomed to linear programming to grasp?

Brian Grant: It is challenging, but I don’t think it’s too challenging. I do think certain languages, especially C and C++, make it very challenging to write robust big multithreaded systems. Basically, they have features that are hostile to concurrency.

So the nondeterminism inherent in concurrency isn’t a deal-breaker?

I don’t think so, not by itself. There are tools and methodologies that can help developers cope with nondeterminism. However, many other features in the languages are problematic: for example, global and static variables in C and C++. In particular, if you’re coming from legacy code where you may not have considered the impact of concurrency, you may have lots of areas that will be problematic.

The key is managing the complexity by imposing some kind of discipline. As with all software, you need to break it into components, layers and well-defined interfaces. You have to access shared data in disciplined way. C, C++ and current compilers are inadequate to help in that way.

Is it possible, as some say, that developers could simply ignore this issue and hope concurrency could be managed in the background, automatically?

We’re pretty far away from automatic concurrency or parallelizing compilers. The analysis that the system would have to do on the code would be pretty heroic. There are tools that can help, however.

Is OpenMP the best approach, then?

OpenMP is effective for a very limited set of applications: parallelizing hot kernels in numeric and scientific computing. OpenMP doesn’t fundamentally address the hard problems of partitioning larger codebases. It works for well-behaved array accesses.

Libraries that hide internal parallelism from developers, such as the PeakStream Platform, are another approach that can also work well in certain problem domains. At PeakStream we made a system that made it easy to write software for multicore chips. On multicore x86 CPUs, the PeakStream Platform parallelized array operations to run on multiple cores and also vectorized them to take advantage of x86 vector units . One feature the PeakStream Platform had that OpenMP did not was the ability to accelerate operations by running them in parallel on a GPU.

Does your experience in compilers give you a different perspective than the average developer?

It probably does. I’ve been working in compilers for more than 10 years, so I know how hard some of the problems are, but I can also imagine new analytical tools and language features that would make concurrency easier.

Existing languages such as C++ tend to favor ease of performance over ease of correctness. Higher-level languages favor ease of correctness over ease of performance. For example, in functional languages such as Haskell, the code you write does not have any side effects. The model is that they don’t modify existing data; they generate new data. The advantage is that it’s easier for the compiler to reason about what different components of the code can do.

In a multithreaded C program, you could have two threads touch the same piece of data simultaneously, creating a race condition. In a functional program, that’s impossible because they each create new data. However, it becomes a performance problem. How do you ensure that the compiler doesn’t create excessive amounts of new data? In C and C++, it’s easier to be efficient because it’s easy to reuse and share data.

What about interpreted languages?

Java does seem to be ahead of C++. There’s a notion of threads and concurrency built in, whereas comparable support is just now being addressed in C++.

Are GPUs or the Cell processor the answer for increased performance with fewer concurrency hassles?

Not for everyone. The performance characteristics are very different. Certain applications run extremely fast; others don’t. They’re also much harder to program—the architectures expose a lot more complexity to software.

In Cell, for example, there are two different instruction sets you have to worry about.

The vector units on Cell have to be programmed in a different way from, say, x86 CPUs. They can’t transparently load data from main memory; they have to do explicit DMAs [direct memory accesses] from main memory into local on-chip memory for both code and data. The vector units don't have a general purpose instruction set. Like GPUs, the vector units need to be programmed differently from mainstream CPUs.

I don’t think the x86 instruction set is the problem.

Who are you looking to for thought leadership on concurrency?

That’s a good question. There’s a variety of interesting work out there, but I don’t know that I’ve seen any compelling or complete solutions yet. Concurrency needs to be well supported throughout the whole software ecosystem: languages, tools, libraries and legacy code. The C++ standards work going on is a good thing. I think the work on annotations to specify thread-safety properties is promising. Research on transactional memory is interesting, but it’s not clear how it will pan out. Exploring functional languages is interesting, but achieving good performance is challenging. Viable industrial solutions based on the more experimental technologies are a number of years off.

Alexandra Weber Morales is the former editor-in-chief of Software Development magazine.


Share this link: http://www.sdtimes.com/link/31335
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading



 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media

4/19/2010
New York City
Flagg Management

4/25/2010 to 4/28/2010
Overland Park, Kans.
IIUG