Most Read Latest News Blog Resources

Integration Watch: Will actors be part of your repertoire?




February 3, 2009 — 
In my Nov. 15 column (“A new twist on threads”), I discussed the emerging importance of fine-grained parallelism. The gist of the column was that the traditional approach to parallelism, called coarse-grained parallelism, in which a specific task is assigned in toto to a single thread, leads to inefficient use of the processor. A better approach is to break up long tasks into a series of smaller tasks, then forward each of those to a thread pool that runs them efficiently and makes good use of all the underlying silicon resources.

The thread pool uses a work queue to hold these small tasks, and this queue assigns the task to the first available thread. This option provides inherent load balancing—all threads can work away. If one thread is held up (waiting for a slow process, such as disk I/O), it’s swapped out and another task is swapped in.

Thread pools provide an additional degree of efficiency: They determine at runtime the ideal number of threads to use. The optimal number is difficult to use in coarse-grained parallelism. For example, with a coarse-grained approach, if you break your application into six threads, then running it on a dual quad-core system will mean that two cores lie dormant. Thread pools, however, use all the available pipelines and distribute the smaller tasks over them. Therefore, you gain load balancing and nearly optimal usage of the processor resources.

Fine-grained parallelism, however, does not free developers from the challenges of coarse-grained threads. These include mutual exclusion for shared data items and the constant threat of deadlock (in which two threads wait on each other).

As a result, a new model for writing concurrent applications is emerging. It borrows a technique from academia. The key structure is called an actor. In its simplest form, an actor is a computation entity whose primary actions are performing operations that are passed to it, passing data to other actors, and creating new actors.

The operations that an actor performs locally are entirely local; they have no effects on other actors. To affect other actors, the actor must pass a message to them, including the data they need or the new instructions. For example, an actor might be created to multiply to matrices. The matrices and the multiplication function are passed to the actor. The resulting matrix might then be passed to another actor that collects results and processes it further.

This parallel-programming technique relies on approaches that initially might seem peculiar. For example, it tends not to use variables (although variables can be used). This structure is familiar to Java programmers in the way strings are handled. Java strings are not variables. Modify a single character and Java creates a new string.

By using constants primarily, parallelization is improved as data cannot be unexpectedly changed by another thread. Done correctly, an actor cannot change the data items inside another actor. Such changes are done by sending messages to the actor. Because of this built-in mutual exclusion, actors are a good match for parallelism. If this appears a little wild, think of REST, which is an almost exact systems analogy. (Requests and data sent as messages, no external modification of internal state save by messages, ability to run many instances in parallel without shared data, etc.)

Today, Erlang is the language with the greatest commercial acceptance that uses actor-like constructs. For Java aficionados, there is an actor framework called ActorFoundry. But, on the JVM, the best choice is the emerging language Scala, which allows you to transition to actors because it also provides support for traditional object-oriented-style programming (to which actors can be added incrementally).

A similar concept in pure data management (that is, data selection and transformation) is known as dataflow, a design that was first expounded in the 1960s. It too uses message passing and adds a built-in capability to monitor the relationship between two data items, such that if one changes, the other is automatically updated (similar to a total field in a spreadsheet).

Pervasive Software is about to release a product called DataRush that has been handling massive amounts of data in tests with only modest hardware platforms, due to its ability to leverage dataflow across all the processor cores. It’s a Java library, so, again, developers can move into this model easily.

Whether via dataflow or via actors, message-passing parallelism is likely to become more prominent during the next few years as a way to leverage the many cores in today’s PCs and servers. Now you know.

Andrew Binstock is the principal analyst at Pacific Data Works. Read his blog at binstock.blogspot.com.


Related Search Term(s): parallel processing


Share this link: http://www.sdtimes.com/link/33232
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading



 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media

4/19/2010
New York City
Flagg Management

4/25/2010 to 4/28/2010
Overland Park, Kans.
IIUG