Why do we have to add -2 to a string when performing operation in assembly language?

Data Segment
    str1 db 'MADAME','$' 
    strlen1 dw $-str1  ;calculating the length of the string
  strrev db 20 dup(' ')
  s1 db 'String is:','$'
  NEWLINE DB 10,13,"$"
  str_palin db 'String is Palindrome.','$'
  str_not_palin db 'String is not Palindrome.','$'
Data Ends

Code Segment
  Assume cs:code, ds:data


    mov ax, data
    mov ds, ax
    mov es, ax
    mov cx, strlen1
    add cx, -2

    lea si, str1
    lea di, strrev

    add si, strlen1
    add si, -2
     mov ah, 09h
     lea dx, s1
     int 21h
     mov ah, 09h
     lea dx, str1
     int 21h
     MOV AH,09H
        INT 21H
       mov al, [si]
       mov [di], al
       dec si
       inc di
       loop L1
       mov al, [si]
       mov [di], al
       inc di
       mov dl, '$'
       mov [di], dl
       mov cx, strlen1

       lea si, str1
       lea di, strrev
       repe cmpsb
       jne Not_Palin

       mov ah, 09h
       lea dx, str_palin
       int 21h
       jmp Exit

       mov ah, 09h
       lea dx, str_not_palin
       int 21h

       mov ax, 4c00h
       int 21h
Code Ends
End Begin


  • First instance of adding -2 (add cx, -2)


    mov cx, strlen1
    add cx, -2        <-- Can be avoided totally

    and also

     mov al, [si]
     mov [di], al
     dec si
     inc di
     loop L1
     mov al, [si]     <-- Should stay inside the loop
     mov [di], al     <-- Should stay inside the loop
     inc di           <-- Should stay inside the loop

    Because of how strlen1 was defined (strlen1 dw $-str1) the add cx, -2 (Why is this not simply sub cx, 2 ?) does not give the correct length of the string. You get 1 too little. Later because of this, your L1 loop has to be appended with 3 extra instructions!

    Second instance of adding -2 (add si, -2)

    lea si, str1
    add si, strlen1
    add si, -2

    Here again, why prefer add si, -2 over the more readable sub si, 2?
    Because of how strlen1 was defined (strlen1 dw $-str1) the add si, strlen1 will make SI point behind the terminating $ character.
    Subtracting 1 will make SI point at the terminating $ character and so behind the last character of the string.
    Subtracting 2 will make SIpoint at the last character of the string.


    Much of the above problems would not exist if you redefined strlen1 so that it does not include the terminating $ character. When people talk about the length of a string they rarely include any terminating character in the count. Such a character (be it $ or zero) is not really part of the string

    strlen1 dw $ - str1 - 1  ;Length of the string

    To see everything in context:

     mov  ah, 09h
     mov  dx, s1
     int  21h
     mov  ah, 09h
     mov  dx, str1
     int  21h
     mov  ah, 09h
     mov  dx, NEWLINE
     int  21h
     cld                 ;To be absolutely safe
     mov  cx, strlen1    ;The improved definition! db 'MADAME','$' => 5
     mov  di, strrev
     mov  si, str1
     add  si, cx         ;Now points behind the last character ('E')
     dec  si
     mov  al, [si]
     stosb               ;Equivalent to "mov [di], al" "inc di"
     dec  cx
     jnz  L1
     mov  byte ptr [di], '$'

    Do note these details:

    • I've cleanly separated the code that displays to the screen from the code that performs the reversing.
    • By placing the dec si instruction before reading at [SI] (We call this pre-decrementing), one instruction could be shaved off before the loop start at L1.
    • I've replaced every lea by mov. The result is the same, but the code is 1 byte shorter. Each time.
    • I've replaced the slow loop instruction by equivalent code dec cx jnz L1.
    • I've replaced the pair of instructions mov [di], al inc di by just 1 equivalent instruction stosb. I could do this because the ES register was setup and I've cleared the direction flag (DF). Your repe cmpsb also depended on the DF=0.
    • I've replaced the pair of instructions that write a new $ terminator by just 1 instruction mov byte ptr [di], '$'.