Search code examples
assemblymasm

Skip spaces - REPZ SCASB


I'm trying to parse command line arguments in assembly program and I want to skip leading spaces. From what I know and what is confirmed in web (e.g. http://faydoc.tripod.com/cpu/repz.htm), simply REPE SCASB or REPZ SCASB should do the trick. But it doesn't. Instead of stopping at first non-space character, it stops at the next one (as proved by debugging with Turbo Debugger). This pattern reoccurs all the time - no difference if there is one or fifty spaces, the ES:DI always points to the byte following the first non-space character. I need this expression to work correctly in order to use the counter in CX for validation purposes.

I use MASM (ML.EXE) compiler under DosBox on Ubuntu 12.04.

.287
dane1 segment

dane1 ends

code1 segment

start1:

;   ******** CODE ********

; Init stack
mov sp, offset wstosu
mov ax, seg wstosu
mov ss, ax


; transcription start       ==========

mov cs:[paramsAddress], es
xor cx, cx
mov cl, byte ptr es:[0080h]     ; number of characters from command line

cmp cx, 0
je error

mov al, ' '
mov di, 0081h       ; beginning of actual console input

cld

repe scasb          ; skip spaces

; here es:di should point to the first 
    ; non-space character (or end of input) 
cmp cx, 0
je error

Solution

  • You're seeing the correct behavior. The scasb instruction sets the condition codes with the current di contents and increments di (or decrements based on the DF). The repe prefix is like a "do ... while" loop around the scasb. It tests the condition codes and repeats as long as the condition succeeds (and also the decremented cx register is non-zero). So when the first character past the leading spaces is scanned, the test fails and repetition stops, but the di register has already been incremented to the next character. You'll just need to account for that in your code, most simply by decrementing di.

    As others have said, in modern x86 versions (as in less than 25 years old or so), there are no advantages to the string instructions except the tiny one of sometimes smaller code.