Search code examples
fileassemblycharacter-encodingdostasm

My asm code writes trash bytes when I use int 21h function


I did a search on Stack Overflow, and I have not found anything similar to my problem. My problem is this: I have a code that opens a file and writes a message at the end. When I use int 21h to write to the file in the first time, it writes well if the file is empty, but if the file has content, the program adds to the end many trash bytes (characters like 畂 or another japanese or chinese characters).

I have checked that the program don't write more bytes than the message length. Please, help me. Here is my source code:

.model tiny
.code

main:
    call delta          
delta:
    pop bp              
    sub bp, offset delta


    mov ax, @code       ;Get the address of code segment and store it in ax
    mov ds, ax          ;Put that value in Data segment pointer.
                    ;Now, we can reference any data stored in the code segment
                    ;without fail.
;Subroutines
open:
    mov ax, 3D02H   ;Opens a file
    lea dx, [bp+filename];Filename
    int 21h         ;Call DOS interrupt
    mov handle, ax  ;Save the handle in variable

move_pointer_to_end:
    mov bx, handle
    mov  ax,4202h                 ; Move file pointer
    xor  cx,cx                    ; to end of file
    cwd                           ; xor dx,dx
    int  21h

write:
    mov ax, 4000H
    mov bx, handle
    lea dx, [bp+sign]
    mov cx, 16
    int 21H

exit:
    mov ah,4Ch          ;Terminate process
    mov al,0            ;Return code
    int 21h

datazone:
    handle dw ?
    filename db 'C:\A.txt', 0
    sign db 'Bush was here!!', 0

end main

Please help me!!


Solution

  • That's because the file to which you're appending the data is encoded in unicode. If you write a file out from Notepad or another text editor and save it, you have to pick ANSI as the encoding. Then if you point your program at the ANSI encoded text file, it should append the string indicated with the expected result.

    Unicode allocates two bytes for every character so in a hex editor you might see s.o.m.e.t.h.i.n.g. .l.i.k.e. .t.h.i.s. rather than something like this that you might expect for ANSI or UTF-8.