Most Read Latest News Blog Resources

Guest View: Java + multicore = good news




October 15, 2008 — 
Dual-core processors are ubiquitous in desktop computers and laptops today; four to eight cores are common in servers, and within the lifetime of the next version of the JDK (Java 7), we are likely to see servers with dozens of cores become available in bulk.

Moore’s Law has given us unimaginable increases in computing power for more than over a generation. But a few years ago—as CPU thermal densities approached that of a small furnace—the strategy of cramming ever smaller transistors on chips that run at ever increasing frequencies seemed destined to hit a wall.

So chip designers pulled a new trick out of their sleeves: They leveraged the increasing transistor densities to fit more cores onto a single chip. While Moore’s Law continues to fit more transistors on chip, it is no longer a given that each new generation of CPUs will automatically run our existing algorithms faster.

It is now up to us to ensure that our code is ready to take advantage of the new generation of multicore CPUs. Our current theory, algorithms, languages and tools have all evolved around the sequential paradigm. The move to a truly parallel world will be a fundamental shift.

At first glance, the task of parallelizing our applications looks no different from multithreaded programming. The Java language is widely praised for having a simple yet powerful threading model that makes writing effective multithreaded applications mainstream. It may therefore seem likely that the ease of writing multithreaded applications will enable us to surmount the challenge of writing multicore applications, but in practice it may not be so easy.

The canonical multithreaded application in Java (such as the Web or application server) processes multiple users or connections concurrently, in multiple threads. That architecture is not only well understood, it is also very well supported by underlying frameworks, whether Web or application servers, to the extent that the average programmer hardly has to think about this issue. The complex tasks for thread pooling and scheduling are efficiently handled, and effectively hidden, by the framework. All we need to do as application programmers is write out servlet or Enterprise JavaBeans on the correct template, and we’re good to go.

In this paradigm, processing for each individual user/connection occurs sequentially. That works very well for certain tasks, and therefore Web programming will be able to leverage such processors more effectively. But since multicore will be ubiquitous, from the smallest processors upward, and will scale up to hundreds or even thousands of cores in a server, we will have to code our algorithms actively to convert them from sequential to parallel ones in order to ensure effective utilization of the processor.

Parallelizing an algorithm, however, will soon run into Amdahl’s Law, which states that the speedup from parallelizing a process is an inverse function of the portion of the process that is necessarily sequential. Thus, not only do we need good algorithms to minimize the sequential portion of a process, but there is also an upper bound to the speedup we can achieve, even with an infinite number of parallel processors.

While that may be a depressing result, parallel processing has been rescued by Gustafson’s Law, which states that while the speedup for a fixed-size process is indeed limited by Amdahl’s Law, we can achieve nearly linear speedups if we consider the amount of processing that can be done in a fixed period.

The next challenge in writing such programs will be in the area of correctness and testing. When multiple processes execute simultaneously on multiple processors (as opposed to the scheduling of threads on a single processor), determinism goes out the window with respect to race conditions. How to test programs that behave correctly when run 1,000 times but hit a race condition when run for the 1,001st time is a question that has touched anyone who has programmed on a multiprocessor box. As the number of independently executing processes or threads rises with large multicore processors, the probability of hitting strange race conditions increases exponentially.

Help may be available in the form of static code analyzers or in the use of frameworks that guarantee certain concurrency primitives.

Our discussion of parallel computation thus far might apply to applications running in distributed fashion on clusters of machines. In fact, it is likely that large-scale computing will run in a mixed multicore/multiserver environment. And while those paradigms have many similar issues relating to parallelizing algorithms, they have very different latency and bandwidth characteristics. Therefore, it may not be easy to manage distributed algorithms optimally in hybrid systems consisting of clusters of computers running multicore processors.

The ideal support for multicore programming would be a parallelizing compile—one that takes a piece of sequential code, analyzes it for parallelism and then writes out code that is executed in parallel. While a lot of research has been done in this area, there is no general-purpose tool that works for all kinds of computing tasks, yet guarantees high efficiency and correctness. The Holy Grail remains far out of reach.

So while programmers still have to convert their algorithms manually to distributed ones, it would be of immense help to base those algorithms on well-known, and well-implemented, concurrency primitives. One such primitive is the Java fork/join framework, created by Doug Lea under the auspices of the JSR166 expert group.

Described as a multicore-friendly, lightweight parallel framework, fork/join uses the strategy of recursively splitting a task into smaller subtasks; forking the subtasks into separate processes or threads, so that they run in parallel on multiple cores; and joining all subtasks to compose a result to return. Expected to be added to Java 7, it will boost the ease of writing parallel programs in Java.

The code listing shown here demonstrates a simple parallel algorithm to compute the Fibonacci numbers. On a two-core machine, it achieves a nearly twofold speedup over an equivalent sequential algorithm:

import jsr166y.forkjoin.*;

class Fib extends RecursiveTask<Integer> {
         static final int threshold = 10;
         volatile int number;
         Fib(int n) { number = n; }
         public Integer compute () {
                 int n = number;
                 if (n <= threshold)
                         return sequentialFib(n);
                 else {
                         Fib f1 = new Fib(n - 1);
                         f1.fork();
                         Fib f2 = new Fib(n - 2);
                         return f2.forkJoin() + f1.join();
                 }
         }
         public static void main(String[] args) {
                 try {
                         int groupSize = 2; // number of CPUs
                         ForkJoinPool group =
                                 new ForkJoinPool(groupSize);
                         Fib f = new Fib(40);
                         Integer result =group.invoke(f);
                         System.out.println(“Fibonacci Number: “ + result);
                 }
                 catch (Exception ex) {}
         }
         int sequentialFib(int n) {
                 if (n <= 1) return n;
                 else return sequentialFib(n-1) + sequentialFib(n-2);
         }
}

It is noteworthy that the program strives to describe the algorithm based on the fork/join primitives; the details of threading and scheduling are hidden by the framework. Notice also that the program moves to sequential computation for problems below a certain size. This kind of tuning is usually required to produce optimal efficiency from distributed algorithms.

The multicore boom ensures that interesting times lie ahead for all programmers. Creating successful and innovative solutions in this environment requires our awareness of the attendant issues, as well as the tools and technologies that help us navigate them.

Avik Sengupta is a director at Lab49, a provider of advanced technology applications for the financial services industry.


Related Search Term(s): Javamulticoresoftware development


Share this link: http://www.sdtimes.com/link/32943
 

Comments

02/11/2009 10:09:50 AM EST

Multicore servers will face the demands of developers to replace network/LAN traffic by in-machine messages. So they will have to offer fine tuning tools for process and resource communication and prioritizing which are machine specific and OS native. Please understand that I am sceptical when we talk about Java using these interfaces as long as I do not see a valid USB library - maybe I missed there sth actual but the problems there where impressing me deeply. I think forking/unforking is only part of the problem and definetely the article does NOT describe a Java specific advantage through technology.

Germanythomas wellbrock


02/11/2009 03:45:21 PM EST

Avik Thanks for the article. It's interesting to see the Java fork/join, and contrast it with the Erlang approach. One question and one comment: Do you think Fibonacci is a good candidate for parallisation? Essentially you are doing almost double the processing effort to compute any given Fibonacci number, since the previous result is not re-used as per dynamic programming. I.e. the 2x speedup can be gained by using a better sequential implementaion. Putting aside the issue of the suitability of the Fibonacci example, I think some readers would be interested to see an Erlang solution to a parallel Fibonacci problem. It uses a parallel map implementation that I've mentioned here: http://21ccw.blogspot.com/2008/05/parallel-quicksort-in-erlang-part-ii.html, which is about 12 lines of code. And the implementaiton (without the threshold support): p_fib(0) -> 0; p_fib(1) -> 1; p_fib(N) -> [F1, F2] = pmap:pmap(fun p_fib/1, [N-1, N-2]), F1 + F2. Which basically maps a function over a list, and the function is evaluated in a separate Erlang process (NOT related to OS processes). To me this is a much simpler (and hence more reliable solution). YMMV Benjamin

United KingdomBenjamin Nortier


Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading



 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media

4/19/2010
New York City
Flagg Management

4/25/2010 to 4/28/2010
Overland Park, Kans.
IIUG