Most Read Latest News Blog Resources

Integration Watch: Studying reliability




July 1, 2009 — 
Consultants who specialize in jumping into sinking projects to get them back on course frequently encounter the same forms of lax discipline. These typically include a lack of good design, a lack of coding standards, a lack of code reviews, a lack of unit testing, poor QA, and, of course, a lack of basic project management skills. When called into such tasks, the first thing consultants will do is attend to the low-hanging fruit: They establish basic check-in procedures, teach the use of unit tests, start doing code reviews, and so forth. All new actions tend towards one goal: improving project reliability. Intuitively, this makes sense.

It makes sense objectively too, because we know that the earlier defects are found in the development life cycle, the faster and less expensive they are to fix. So, if you improve reliability, you improve delivery timetables and cost. Reliability inherently leads to lower costs and faster delivery. This all seems clear, reasonable, perhaps even obvious.

Most places, however, don’t work under this “obvious” relationship of speed, quality and cost. Rather, their actions reflect the glib canard frequently repeated in dev circles: “Good, fast, cheap: pick any two.” Asserting an opposition between these three elements is to misunderstand how quality imbues the project with the other two qualities. And for this reason, when products fall behind, managers and developers typically forgo quality to gain the benefit of time (and secondarily of cost).

The demands placed on development organizations today reinforce the emphasis on delivery time. Consider, for example, the surge of interest in dynamic languages during the last few years: All of them have the common goal of making it easier to belt out code quickly, despite the fact that some of their features (duck typing for instance) have made it more difficult to assure reliability. For quick and dirty apps, or those where errors that elude basic testing are not terribly costly, this approach is sufficient.

But it leads to dangerous habits in which the connection between quality and the other two factors is slowly but inexorably eroded. Part of that fraying is that development organizations can begin to forget how to do quality work; a lapse that becomes all too evident when they have to write mission-critical software. And the result, in my view, is that important projects become terribly bogged down, because the accumulated decisions that play down quality eventually bring the project to its knees. Welcome, consultants!

One effective way for sites to reinforce a central commitment to quality is to periodically examine the workings of organizations that are dominated by the pursuit of quality. There are, in addition, methodologies for software engineering, but those are an investment of a whole different order. Sometimes, highly reliable projects provide sufficient inspiration for new ways of infusing quality.

One such project is the NASA space shuttle, which, it turns out, is highly automated. The entire launch, lift off and dumping of fuel tanks is completely automated. This needs to be reliable code! (See this for background on this.)

The shuttle's codebase runs to 420,000 lines of code. Over the span of its 11 releases, its development team has encountered exactly 17 post-release defects. The team reaches such a high level of quality because of several factors (not the least of which are nearly unchanging requirements and a huge budget). But they have some unique practices worth emulating: Every change is thoroughly documented (and carefully reviewed) at the design stage, at the pre-coding stage, and after coding.

Every stage, including testing, is subject to intensive review. And everything is documented. The project’s internal docs are 40,000 pages long. Most places can’t duplicate this level of commitment, but they could do more careful planning before changing code, and they could do reviews of tests in addition to code reviews.

The space shuttle group does one other important thing. For every defect found, it examines the relevant docs to see how the bug got past all the checks. It then amends its process so as to close off the newfound point of ingress for defects.

Clearly, most organizations cannot do that much. But they could do something minimal that would be effective. For example, if a bug gets past unit testing, how many places have a rule that a unit test must be written to detect that bug before the bug is fixed? Few indeed. Fewer still run code reviews on the tests, and fewer still will go over test code review procedures to determine how the bug go through.

A practice recommended to junior developers who want to improve is to read the code of great developers. Less frequently recommended, but perhaps more important, is that you must read about organizations who must focus on quality above all in order to improve your own quality.

Andrew Binstock is the principal analyst at Pacific Data Works. Read his blog at binstock.blogspot.com.


Related Search Term(s): testing


Share this link: http://www.sdtimes.com/link/33538
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading



 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media

4/19/2010
New York City
Flagg Management

4/25/2010 to 4/28/2010
Overland Park, Kans.
IIUG