Saturday, November 25, 2006

Software testing - peeling the onion

Suppose you've got some old software that's getting a face lift. This happens all the time. The old stuff appeared to work perfectly in its old task, but now some new demands are being made on it.

Code paths previously untraveled now are getting executed. Perhaps this is because that unexecuted code was put in to conform with some sort of standard that no compilers of the day emitted in its totality (in this case Intel's OMF86 which was only subsetted by language tools targeted to PC's). When all you're working with is a paper spec and no tool that actually generates certain aspects of that spec, well...shit happens...or in this case, shit happened.

The old Lattice C compilers never generated threaded fixup records in their OBJ files, but the code in question includes an OMF loader that purports to recognize threaded fixups because...well...they were part of the Intel OMF spec.

You the reader don't need to understand what a threaded fixup is, only people writing compiler back ends, linkers, and maniacs like me need to know such things in detail. What you do need to understand though is that the threaded fixup loading code had a bug. That bug I injected in the code back in the early-80's had gone undetected for some 20+ years and the program worked perfectly with the compiler outputs it was originally designed to work with -- it was a latent bug.

So I'm chugging along adding the recognition/identification support for a lot of other compilers besides the old Lattice one. Computer Innovations C86, Lattice 3.x versions, all the Borland stuff from Turbo C 1.0 onward, and I finally get to the early Microsoft C compiler.

BANG. Microsoft generates threaded fixups. The latent bug rears its ugly head and the world collapses around me.

OK fine, its going to be that way eh? A couple of days pouring through old OMF docs, head scratching, getting back up to speed on what was previously imagined to be a flawless piece of software, jacking around with a debugger and the latent bug is fixed and extinguished. Unfortunately it wasn't an instant death kind of bug. It was one of those silent killers where the stuff just wasn't working right and you didn't find out about it until thousands of instructions and dozens of function calls later when the bad data causes some problem. Backtracking those to their origin is always a PITA...but I digress.

I thought that was it. It wasn't. Now that output from the Microsoft C compiler can be loaded/parsed/recognized, more problems with fixups surface because the Microsoft "huge" model is generating a form of fixup the code wasn't designed to handle. No big deal. That one was pretty easy to add. Dig up some docs on Microsoft fixups and 15 minutes later that one is nailed.

The point here is that I'm apparently going to be peeling the onion a lot in this endeavor. Even in apparently well functioning code, there will be a lot of latent bugs because of test case gaps, spec confusion, etc. Behind every bug you see, many more may be lurking once that one is fixed.

I've seen test/development managers dupe the higher level managers by shaping presentations of defect status. My original 20+ year old bug wasn't just "one bug", I know its going to turn into many bugs over the next few days.

If you just report on just "failed" test cases, without saying how many variations are blocked by that "one bug", its hard to get a handle on how many layers of the onion you'll be peeling once that "one bug" has been "fixed".

This becomes particularly embarrassing as a product nears its ship date and the "difficult" problems previously put off that blocked hundred of variations get fixed and suddenly the defect count spikes rather than going down. An astute manager will want to correlate failed test cases against blocked variations so a SWAG at the spike might be taken, or executives at least warned that such a phenomenon might occur.

Anyone who has shipped tens of millions of lines of code over dozens of releases will eventually develop some historical data that will give them insight into the nature of the SWAG they need to make.

No comments: