Thursday, February 08, 2007

So many compilers, so much retardation

I have this ongoing programming project that analyzes and hacks on the OBJ's spit out by compilers and assemblers. Its design goal is to be able to read in anything that was produced to the Intel/TIS OMF spec and variants thereof (like PharLap's "Easy OMF"). 16 bit code, 32 bit code, mixed 16/32 bit code, etc.

This is the Grand Unification Theory of OBJ post-processors. It will be able to handle anything from a 1981 vintage version of Microsoft's Macro Assembler, to the latest 2006 version of Borland C++ Builder, and "do things" to their output.

As I've been working along and testing stuff produced by dozens of different compilers and assemblers, the one thing I've noticed is that virtually ALL of them, from the inception of the PC in 81' to current day 26 years later, have either code generation bugs, or produce output that, while it may work, is profoundly retarded in one way or another.

The latest thing that set me off on this rant is Microsoft MASM and Borland's TASM. Both of these assemblers (and SLR's Optasm) can generate some truly horrendous OMF in some situations that bloats the size of the OBJ by as much as 10X over what it needs to be. 10X is significant. If you're specifying say a 20K segment full of stuff and building it 2 bytes at a time by emitting 10,000 records, that's going to have an impact on linking performance compared to 20 records of 1K each. Particularly when each record adds 5 or 7 or 8 bytes of overhead (above the two bytes for the data itself).

Fixing a lot of this dumb stuff is in many ways its the classic textbook producer/consumer scenario. Take in streams of retarded crap and produce streams of optimized less retarded crap. The devil as always is in the details - there are so many ways these compilers do things inefficiently that its a lot of testing work just discovering the retardations, let alone fixing it.

And I'm not even into the peephole optimizer work yet ;-> I'm still in the structural reorganizations and optimizations on the OMF records phase. Its taken vast quantities of work just to get to where I can think about starting on the really interesting stuff.

2 comments:

Mike said...

Great! I am taking an assembly language class that uses nothing but MASM! But then, it does seem easy enough, so maybe it's a good, gentle introduction...

Purple Avenger said...

Recent versions of MASM are fairly reliable. Not nearly as buggy as some of the ones from the early/mid 80's.

As much as I bitch about them, MASM and TASM are both quite usable.