Thursday, December 07, 2006

Microsoft MASM versus Borland TASM - two different architectures

Previously I noted you can tell a lot about a compiler's internal structure by examining the structure of the .OBJ modules it spits out. In the case of the Microsoft Macro Assembler (MASM) versus Borland's Turbo Assembler (TASM) this also turned out to be true.

MASM appears to be architecturally a multi-pass assembler, whereas TASM architecturally looks like a single pass assembler (with some limited multi-pass capability for compressing forward jump offsets).

How do I come to this conclusion? LNAMES records. The LNAMES record(s) in a .OBJ file specify the name strings representing segments names, segment groups names, and segment classes names.

When there are multiple segments in a .OBJ MASM collects up all the names used, removes duplicates, and (typically) emits a single LNAMES record. Since segments and group definition can be scattered all over a .ASM source file, of necessity one must scan the whole thing to the end to encounter them all. The fact that MASM eliminates any dupe names before emitting a LNAMES record in to the .OBJ shows it had to scan the .ASM file all the way to the end and adjust all the SEGDEF and GRPDEF record name indexes accordingly when the dups are zapped.

TASM takes a different approach. TASM emits a LNAMES record for every segment it encounters. This suggests that TASM doesn't make multiple passes to collect up name strings. Nor does TASM bother to remove duplicate names. The LNAMES records TAMS emits will contain duplicate strings, and there will be a lot more LNAMES records than MASM would emit. From a .OBJ semantics POV both approaches are identical. The TASM approach however results in .OBJ files that are plumper and make a linker do more work.

Single-pass - faster generation of result, plumper result.
Multi-pass - slower generation of result, more efficient result

In reality, MASM will generate multiple LNAMES records too when the total record length of a given LNAMES rec would exceed 1K bytes in length. TASM never encounters this issue because its emitting scads of tiny LNAMES records. Apparently there are some tools somewhere that imposed a 1K limit on the size of a record in a .OBJ. Internally, the project I'm working on deals with LNAMES in an idealized manner and collects up the whole wad in a manner MASM does, so I had to insert code to do the 1K parceling out of physical LNAMES when I emit an .OBJ as well.

No comments: