News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 2/1/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Visual Studio 2010 Release Candidate Available Today
A Visual Studio 2010 release candidate is available on MSDN.
02/09/2010 09:45 AM EST

Is Microsoft eyeing Office subscription pricing?
Microsoft may be preparing to offer a new Office pricing option called "union," which charges the same for cloud as on-premises.
02/01/2010 09:38 AM EST

Facebook rewrites PHP runtime
Facebook is about to open source its own PHP runtime, written from scratch for speed.
01/30/2010 08:53 PM EST

 

Events calendar tab
2/9/2010 to 2/13/2010
San Francisco
IDG World Expo

2/10/2010 to 2/12/2010
San Francisco
BZ Media

2/17/2010 to 2/25/2010
Atlanta
Python Software Foundation

2/19/2010 to 2/20/2010
Los Angeles
SCALE

2/21/2010 to 2/24/2010
Las Vegas
IBM


 
Most Read Latest News Blog Resources

Larry O’Brien: Fun With Compilers...Really




September 1, 2007 — 
I used to think that writing a compiler was the most fun you could have programming. I now amend that. Writing a unit-tested compiler is the most fun you can have programming.

Compilers and translators are not very commonly required to solve business needs, but not every data-source produces library-parsable XML, and the tools needed to transform structured-but-untagged data are the same ones that are used to emit code. And while the topic of parsing text is deep enough to warrant thick books, the types of input that a business developer is likely to face are usually easily tackled with modern tools. When a translation or code-generation problem arises in discussion, your pulse should quicken from excitement rather than fear.

I think the best modern tool for parsing is ANTLR (www.antlr.org), recently updated to a long-awaited third version. ANTLR is a long-term project from the University of San Francisco’s Terence Parr and can handle most any grammar thrown at it (technically, it generates arbitrarily deep LL look-aheads and memorizes them for performance in a packrat-like manner). ANTLR has a fantastic IDE called ANTLRWorks (developed primarily by Jean Bovet) and a text-generation library called StringTemplate. Each component has a learning curve, and the overall system is complex enough that even the most bold would be wise to work their way through the time-honored four-function calculator example. This “Hello, World” of parsing is used in Parr’s new book “The Definitive ANTLR Reference,” published by The Pragmatic Programmers, and unequivocally necessary as a supplement to the ANTLR documentation and wiki.

Although I’ve used ANTLR in the past, only with the new release have I used it to tackle a production problem—handling a quite complex mainframe output with decades of accreted special-case quirks and codes. As is not uncommon, my clients felt locked in to a supplier who charged them tens of thousands in annual license fees, confident that no one would breach the barrier to entry of parsing the mainframe data. When client discussions involve people saying things like “If you look at line 429,327, you’ll see an example of the problem,” such confidence might have been justified in the past.

If you haven’t written a compiler in the managed era, you probably think of such development as something akin to rebuilding an engine: You work blindly for long stretches when things won’t even start. It’s only fairly late in the process when you experience the joyous emergent characteristic of a program blossoming into functionality. With ANTLR and a unit-testing framework, there is no period without feedback: you can unit-test both your front-end analysis and your back-end generation from the bottom up, snapping them together as you go and relying on your lower-level tests to catch any mistakes. While you may forgo the Frankenstein-ian elation of a parser coming to life after endless error messages, when you put a “+” at the end of a grammar rule and chew through a multimegabyte input on the first run, you may experience sudden head-swelling.

Unit-testing a compiler under construction will, however, expose you to the most frustrating aspect of unit-testing: scaffold shattering. Testing suites are filled with mock objects and data structures manually stitched together; no matter how trivial, a systemic change to the data structures will often break dozens of tests. Those not convinced of the long-term benefits of unit-testing (surprisingly, such skeptics still exist) will be tempted to abandon the suite or comment out large swatches of tests (“Of course the child-node count is 12”). The punctilious will be tempted to refactor the suite toward meta-testing (“Gee, I could generate the construction of mock objects…”). I don’t like test suites refactored toward abstraction; one of the primary purposes of a test suite is as an aid to comprehending the system under test, and I think it’s important that tests take as straight a path as possible between instantiation and system test. Personally, I would rather pay the price of an occasional multihour fixup during development than the price of a difficult-to-understand suite in 18 months.

ANTLR and ANTLRWorks are themselves Java applications, but code generation is abstracted with the StringTemplate library, and lexers and parsers can be generated in a large number of target languages. Unfortunately, at the moment C#, Python and Ruby code generation lag behind Java; none is yet up to snuff when generating the tree parsers necessary for real language implementation. So the use of ANTLR to write a compiler for the Dynamic Language Runtime is probably a few months away. I wonder who will be the first to try?

Larry O’Brien is a technology consultant, analyst and writer. Read his blog at www.knowing.net.


Share this link: http://www.sdtimes.com/link/31101
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading