Search code examples
assemblyx86nasm

How can I detect an EOF in assembly, nasm?


I am trying to detect an EOF character, or just any character at all, but it doesn't work, no error either.

section .data
    file db "text.txt", 0

section .bss
    char resb 1

section .text
    global _start

_start:
    mov rax, 2
    mov rdi, file
    syscall
    mov rbx, rax

    mov rdi, rbx
    mov rax, 0
    mov rsi, char
    mov rdx, 1
    syscall

    mov rcx, char
    cmp rcx, -1
    je _endOfFile

    call _end

_endOfFile:
    print 1, file, 0
    ret

_end:
    mov rax, 3
    mov rdi, rbx
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

I expected it to print the name of the file, but it doesn't do anything. When I remove the cmp, and just make it jump it prints it fine. I also tried it for other characters and it didn't work for those either. I am really new to assembly, so I have no clue what to do.


Solution

  • Okay, a few layers of problems here.

    Most fundamental is that there is no "EOF character". Unlike ISO C's getc(), the Unix read system call doesn't signal end-of-file by reading back a particular character, it signals it by returning 0 as its return value. So you need to check the value in rax after the read syscall. If it is zero, then you have reached end-of-file. If it is 1, then you successfully read a character into the memory location char. If it is a smallish negative number, then an error occurred, and the negation of this value is an errno code.

    The comparison code also has a few bugs. First of all, mov rcx, char doesn't load the character from char, it loads the address of char, which naturally does not equal -1. If you look, this is exactly similar to the mov rsi, char you used to set up the system call, which likewise put the address of char into rsi.

    To specify the contents of memory at location char, you use square brackets: mov rcx, [char]. However, that wouldn't be right either. On x86-64, most instructions can operate on 8, 16, 32 or 64 bit operands. When at least one operand is a register, the size of the specified register dictates the operand size. So mov rcx, [char] would load 8 bytes, of which the lowest would be the byte from char, and the other 7 would be whatever garbage happened to follow it in memory.

    To load one byte, use an 8-bit register, like cl. Then you need to likewise do the compare with only the 8-bit register, or else you're comparing against stuff that is not your character.

    mov cl, [char]
    cmp cl, -1
    je got_ff
    

    Though actually, in most cases, instead of mov cl, [char] it would be better to do movzx ecx, byte [char] which zeros out the upper bits of rcx. mov cl, [byte] is defined as preserving those bits, which comes with a slight performance cost.

    But actually actually, you don't need to load the character into a register at all; cmp works fine with a memory operand.

    cmp byte [char], -1
    je it_was_ff