Search code examples
assemblydosx86-16

Why do we have to add -2 to a string when performing operation in assembly language?


Data Segment
    str1 db 'MADAME','$' 
    strlen1 dw $-str1  ;calculating the length of the string
  strrev db 20 dup(' ')
  s1 db 'String is:','$'
  NEWLINE DB 10,13,"$"
  str_palin db 'String is Palindrome.','$'
  str_not_palin db 'String is not Palindrome.','$'
Data Ends

Code Segment
  Assume cs:code, ds:data

  Begin:

    mov ax, data
    mov ds, ax
    mov es, ax
    mov cx, strlen1
    add cx, -2

    lea si, str1
    lea di, strrev

    add si, strlen1
    add si, -2
     mov ah, 09h
     lea dx, s1
     int 21h
     mov ah, 09h
     lea dx, str1
     int 21h
     MOV AH,09H
        LEA DX,NEWLINE
        INT 21H
    L1:
       mov al, [si]
       mov [di], al
       dec si
       inc di
       loop L1
       mov al, [si]
       mov [di], al
       inc di
       mov dl, '$'
       mov [di], dl
       mov cx, strlen1

    Palin_Check:
       lea si, str1
       lea di, strrev
       repe cmpsb
       jne Not_Palin

    Palin:
       mov ah, 09h
       lea dx, str_palin
       int 21h
       jmp Exit

    Not_Palin:
       mov ah, 09h
       lea dx, str_not_palin
       int 21h

    Exit:
       mov ax, 4c00h
       int 21h
Code Ends
End Begin

Solution

  • First instance of adding -2 (add cx, -2)

    Consider

    mov cx, strlen1
    add cx, -2        <-- Can be avoided totally
    

    and also

    L1:
     mov al, [si]
     mov [di], al
     dec si
     inc di
     loop L1
     mov al, [si]     <-- Should stay inside the loop
     mov [di], al     <-- Should stay inside the loop
     inc di           <-- Should stay inside the loop
    

    Because of how strlen1 was defined (strlen1 dw $-str1) the add cx, -2 (Why is this not simply sub cx, 2 ?) does not give the correct length of the string. You get 1 too little. Later because of this, your L1 loop has to be appended with 3 extra instructions!


    Second instance of adding -2 (add si, -2)

    lea si, str1
    add si, strlen1
    add si, -2
    

    Here again, why prefer add si, -2 over the more readable sub si, 2?
    Because of how strlen1 was defined (strlen1 dw $-str1) the add si, strlen1 will make SI point behind the terminating $ character.
    Subtracting 1 will make SI point at the terminating $ character and so behind the last character of the string.
    Subtracting 2 will make SIpoint at the last character of the string.


    Suggestion

    Much of the above problems would not exist if you redefined strlen1 so that it does not include the terminating $ character. When people talk about the length of a string they rarely include any terminating character in the count. Such a character (be it $ or zero) is not really part of the string

    strlen1 dw $ - str1 - 1  ;Length of the string
    

    To see everything in context:

     mov  ah, 09h
     mov  dx, s1
     int  21h
     mov  ah, 09h
     mov  dx, str1
     int  21h
     mov  ah, 09h
     mov  dx, NEWLINE
     int  21h
    
     cld                 ;To be absolutely safe
     mov  cx, strlen1    ;The improved definition! db 'MADAME','$' => 5
     mov  di, strrev
     mov  si, str1
     add  si, cx         ;Now points behind the last character ('E')
    L1:
     dec  si
     mov  al, [si]
     stosb               ;Equivalent to "mov [di], al" "inc di"
     dec  cx
     jnz  L1
     mov  byte ptr [di], '$'
    

    Do note these details:

    • I've cleanly separated the code that displays to the screen from the code that performs the reversing.
    • By placing the dec si instruction before reading at [SI] (We call this pre-decrementing), one instruction could be shaved off before the loop start at L1.
    • I've replaced every lea by mov. The result is the same, but the code is 1 byte shorter. Each time.
    • I've replaced the slow loop instruction by equivalent code dec cx jnz L1.
    • I've replaced the pair of instructions mov [di], al inc di by just 1 equivalent instruction stosb. I could do this because the ES register was setup and I've cleared the direction flag (DF). Your repe cmpsb also depended on the DF=0.
    • I've replaced the pair of instructions that write a new $ terminator by just 1 instruction mov byte ptr [di], '$'.