Search code examples
windowsdosportable-executable

Is there a safe way to identify MS-DOS executable?


I'm trying to identify and filter out all MS-DOS executables files out of executable files I have.

As far as I know, PE differ from MS-DOS by the headers he have which MS-DOS doesn't have, but for some reason some of the samples I have are recognized by TrID as MS-DOS even though they are PE.

I can't find any documentation on the subject, and I searched a lot.

Thanks!


Solution

  • The problem with identifying MS-DOS executables is that technically Windows PECOFF executables are also valid MS-DOS executables. PECOFF executables are prefixed with an "MS-DOS Stub", which is a complete MS-DOS program that in most executables prints a message like "This program cannot be run in DOS mode".

    So the first thing to is do is to look at the MS-DOS executable header, and see if if it's valid. It looks like this (from Ralf Brown's Interrupt List):

     00h  2 BYTEs .EXE signature, either "MZ" or "ZM" (5A4Dh or 4D5Ah)
            (see also #01593)
     02h  WORD    number of bytes in last 512-byte page of executable
     04h  WORD    total number of 512-byte pages in executable (includes any
          partial last page)
     06h  WORD    number of relocation entries
     08h  WORD    header size in paragraphs
     0Ah  WORD    minimum paragraphs of memory required to allocate in addition
            to executable's size
     0Ch  WORD    maximum paragraphs to allocate in addition to executable's size
     0Eh  WORD    initial SS relative to start of executable
     10h  WORD    initial SP
     12h  WORD    checksum (one's complement of sum of all words in executable)
     14h  DWORD   initial CS:IP relative to start of executable
     18h  WORD    offset within header of relocation table
          40h or greater for new-format (NE,LE,LX,W3,PE,etc.) executable
     1Ah  WORD    overlay number (normally 0000h = main program)
    

    The key values to check are at offsets 00h and 18h. The two bytes at the start of the file, the signature, must be "MZ" or 54ADh. While "ZM" also works for MS-DOS program, Windows requires that PECOFF executables use the more common "MZ" signature. The next thing to check is the 16-bit value at offset 18h. It needs to be greater than or equal to 40h for this to be an PECOFF executable.

    If the values at offsets 00h and 18h check out then the next thing to do is to read the 32-bit value at offset 3Ch. This contains the offset of the actual PECOFF header. You then need to check the header stars with the signature "PE\0\0", that is, the two characters "P" and "E", followed by two 0 bytes.

    Note that its possible to find other letters at the location given at offset 3Ch, like "NE", "LE", "LX" which were used for 16-bit Windows executables, VxDs, and 32-bit OS/2 executables respectively. These other executable formats also have MS-DOS stubs and locate their real headers the same way.