Search code examples
assemblyx86nasmstrcmp

Subtracting two characters


I just started programming in assembly so I am a beginner.

To practice, I am trying to rewrite a basic libc in assembly (NASM Intel syntax).

But I'm stuck on the strcmp function:

;; Compare two C-style NUL-terminated strings
;; Inputs   :  ESI = address of s1, EDI = address of s2
;; Outputs  :  EAX = return an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2
strcmp:
    call strlen
    mov ecx, eax ; ecx = length of the string in esi

    repe cmpsb
    sub esi, edi ; result = *esi - *edi
    
    mov eax, esi
    
    ret

For me, it should work like this:

s1 db 'Hello World', 0
s2 db 'Hello Stack', 0

After the repe cmpsb instruction, ESI should be equal to [s1 + 7] and EDI to [s2 + 7].

So I just have to do EAX = 'W' - 'S' = 87 - 83 = 4

The problem is, it doesn't work. I think the problem is that when I execute this instruction:

sub esi, edi ; result = *esi - *edi

I don't think that it means: subtract the characters pointed to by EDI and ESI.

Does anyone have an idea on how I can do this?


Solution

  • Your code is almost correct. There are three issues left:

    • you should not assume that strcmp preserves the contents of esi and edi unless you have explicitly specified that it does so. It's very easy to later change strcmp and then forget about the requirement, leading to all sorts of annoying problems.
    • instead of returning the difference between *edi and *esi, you return the difference between edi and esi. Also, as cmpsb advances esi and edi by one, the last characters compared are found at edi[-1] and esi[-1].
    • you have an off-by-one error: strlen returns the number of characters that preceed the NUL byte, but you do need to compare the NUL byte as well. Otherwise, you'll end up finding that two strings are equal if one is a prefix of the other since you never check that the second string actually ends when the first one does.

    To fix the first issue, I recommend you to save and restore esi and edi around the call to strlen. The easiest way to do so is to push them on the stack:

        push esi             ; save ESI and EDI
        push edi
        call strlen          ; compute the string length
        pop  edi             ; restore ESI and EDI
        pop  esi
    

    The second issue is fixed by loading the characters to compare from memory, computing the difference, and then storing the result to eax:

        movzx eax, byte [esi-1] ; load byte from ESI[-1] and zero extend into EAX
        movzx ecx, byte [edi-1] ; load byte from EDI[-1] and zero extend into ECX
        sub   eax, ecx          ; compute the difference
    

    This also addresses the third issue by using the correct offsets right away. Note that movzx is needed here instead of the slightly simpler

        mov   al, [esi-1]       ; load byte from ESI[-1] into AL
        sub   al, [edi-1]       ; subtract EDI[-1] from AL
    

    since we want the result of the subtraction to be correctly sign-extended into eax.