Search code examples
assemblyasciidosx86-16codepages

8086 Assembly Int 21h and Extended ASCII characters


I need some help with an assignment. I need to process a plain text file in ASCII and return how many characters of each code are present (how many a's, how many b's, and so on). It works now almost perfectly.

I now have the problem that, if there is an extended ASCII character on the file, when I use the 3fh service interruption, it doesn't read them well.

For example, if the file has an é (ascii code 130), it reads an ß (ascii code 225). I'm afraid I'm using the interrupt wrong, but I wouldn't know what to do, so a little help here would be greatly appreciated. Debugging also doesn't help, because the interrupt is executed well without errors, it just returns with the wrong values in the buffer.

This is the exact code I'm using to read the file. I have the handle from a previous interrupt.

      xor ax,ax
      lea dx, buffer        
      mov ah,3fh            
      mov bx,handle         
      mov cx,4096           
      int 21h               

Thanks!

EDIT

I found the problem, but no idea how to solve it. Turns out that the character read as a 225 it's not é, but á. The code for á should be 160 according to every ASCII table i've found... but it's 225 in Unicode... Which is weird, since I'm specifically telling Notepad to save it as ANSI, not ASCII...


Solution

  • You are confusing code pages.

    MS-DOS uses code page 437, where é is code 130. But Notepad in ANSI mode uses code page 1252, where é is code 233.

    ASCII is defined only up to 127, so there is no such thing as an ASCII chart for 130 or 160. Extended ASCII is not standardized, so different people extend it in different ways. In particular, MS-DOS and Windows use different code pages which are effectively different extended ASCII tables.

    If you're going to be using MS-DOS to manipulate your file, then use code page 437. If you're going to be using Windows to manipulate your file, then use code page 1252. (Or better, use Unicode.)

    But you cannot get a file to be interpreted the same in both MS-DOS and Windows, in the same way you cannot write a book that can be interpreted the same in both English and French.