Search code examples
assemblyx86-64nasm

Comparing a character in nasm x86-64 using 64 bit registers


I am new to x86_64 coding and I'm exploring various codes.

I am trying to compare a character 'A' in nasm x86_64 code but the comparison is not equal.

; Tests the command line. Similar to a C main function:
;  int main(int argc, char** argv)
;
; - argc, number of args in rdi
; - argv, pointer to pointer in rsi and argv

    global  main
    extern  puts
main:
    mov     rbp, rdi      ; argc in rdi. Save in rbp, caller-saved
    mov     rbx, rsi      ; argv in rsi. Save in rbx, caller-saved
    
    add     rbx, 8
    dec     rbp
    mov     rdi, [rbx]
    cmp     rdi, 'A'
    je      exit1
    call    puts
    mov     rax, 0
    ret

exit1:  
    mov     rax, 1
    ret

How can the code be modified to get the desired result? There are solutions but it uses 8 bit registers as used here comparing characters in assembly, nasm, but to clear my basics I wanted to understand why direct comparison such as my code does not work.


Solution

    • argv is a pointer to a list of pointers pointing to null-terminated strings. At the top you must dereference the pointer first mov rbx, [rsi] so you are actually inspecting the list of *char values.

    • The size of the cmp operands (and in fact for virtually all instructions) matters, quote from Intel’s instruction reference on cmp:

      When an immediate value is used as an operand, it is sign-extended to the length of the first operand.

      You are in fact comparing two 64-bit quantities.

    • There are (at least) two possible solutions:

      • cmp dil, 'A': You can still retrieve 8 Bytes from memory at once, but for comparison purposes you’ll need to specify a partial register.
      • Insert movsx rdi, dil before cmp (or combined with the fetch in one step movsx rdi, byte [rbx]): As Jester suggested, you can still perform a 64-bit comparison, but ensure the upper bits are set properly.
    • Writing cmp rdi, 'A' is an alternative to writing cmp rdi, 0x41. Upon execution, 0x41 gets sign-extended, so you end up comparing 0x00000041 with the contents of rdi. Now, if you are previously loading eight Bytes from [rbx], the upper seven Bytes are potentially filled with gibberish or, say, following characters from a char string. This is causing you troubles.

    • Side note: I’m not sure whether you can call puts like that, speaking of parameter passing. Isn’t rdi supposed to contain the address of a null-terminated string?