Search code examples
assemblyx86masm32string-operationseflags

How are the SCAS and MOVS instructions affected by the value of the direction EFLAG?


I want to know how setting or clearing the direction EFLAG changes how the SCAS and MOV instructions decrement or increment registers. I read some webpages and made the following assumptions I will list below.

I am using the MASM 32 SDK - no idea what version, I installed via Visual MASM's download and installation wizard - with Visual MASM to wright and MASM32 Editor to link and build them into objects and executables. I use a Windows 7 Pro 64 bit OS.

SCAS

  1. The SCAS instruction "compares a byte in AL or a word in AX with a byte or word pointed to by DI in ES." Therefore,to use SCAS, target string address must be moved to EDI and the string to find must be moved to the accumulator register (EAX and variants).

  2. Setting direction flag then using SCAS will stop SCAS from running when using 32 bit systems. On 32 bit systems, it is impossible to force SCAS to "scan a string from the end to the start."

  3. Any REP instruction always uses the ECX register as a counter and always decrements ECX regardless of the direction flag's value. This means it is impossible to "scan a string from the end to the beginning" using REP SCAS.

Sources:
SCAS/SCASB/SCASW, Birla Institute of Technology and Science
Scan String, from c9xm.me
SCAS/SCASB/SCASW/SCASD — Scan String, from felixcloutier.com
MASM : Using 'String' Instructions, from www.dreamincode.net/forums

Below is part of the code from a program I will refer to in my questions:

;Generic settings from MASM32 editor 
.386
.model flat, stdcall
option casemap: none

.data?
Input db 254 dup(?)
InputCopy db 254 dup(?)
InputLength dd ?, 0
InputEnd dd ?, 0

.data

.code

start:
push 254
push offset Input
call StdIn
mov InputLength, eax

;---Move Last Word---
lea esi, offset Input
sub esi, 4
lea edi, offset InputEnd
movw

;---Search section---
lea esi, Input
lea edi, InputCopy
movsb

mov ecx, InputLength
mov eax, 0
mov eax, "omit"

lea edi, offset InputEnd
repne scasw
jz close ;jump if a match was found and ZF was set to 1.
  1. The code under the "Search" section searches the string InputEnd 4 bytes at a time and thus 4 characters at a time. The block scans for the characters in EAX, i.e. the word "omit", ALWAYS beginning at the value of the memory address in edi then incrementing based on the suffix of SCAS (B, W, D, Q)(MASM : Using 'String' Instructions, dream-in-code.com).

MOVS

  1. Using the "Move Last Word" section, I am able to get the last byte out of the string Input. I then used MOVSW to move just the last 4 bytes of the string Input to InputEnd, assuming the direction flag is clear. I must define Input as an array of bytes - Input db 32 dup(?) - for the block to work.

  2. Regardless of how I define InputEnd (whether "dd ?, 0" or "db 12 dup(?)") mov and scas instructions' operation (flags set, registers modified etc.) will not change. The increment/decrement amount of SCAS and MOV are dependent on the suffix/last letter of the command, not the defined bytes or size of the pointers stored in EDI and ESI.

  3. It is impossible to make MOVS transfer from the beginning to the end of a string. You must the length of the string; load the corresponding addresses to EDI and ESI; Add the length of the string to the addresses stored at EDI and ESI; Last, set the direction flag using std. A danger here is targeting addresses below the source or destination bytes.

  4. It is impossible to reverse a string's letters using MOVS since EDI and ESI are either both decremented or both incremented by MOVS.

Sources (asides from previously listed sites in SCAS section):
https://c9x.me/x86/html/file_module_x86_id_203.html
http://faydoc.tripod.com/cpu/movsd.htm

Are these assumptions correct? Is the x86 text on the sites' URLs a sign that the websites have wrong information?


Solution

  • First of all, repe/repne scas and cmps aren't fast. Also, the "fast strings" / ERMSB microcode for rep movs and rep stos is only fast with DF=0 (normal / forward / increasing address).

    rep movs with DF=1 is slow. repne scasw is always slow. They can be useful in the rare case where you're optimizing for code-size, though.


    The documentation you linked sets out exactly how movs and scas are affected by DF. Read the Operation section in Intel's manuals.

    Note that it's always a post-increment/decrement so the first element compared doesn't depend on DF, only the updates to EDI and/or ESI.

    Your code only depends on DF for the repne scasw. It doesn't matter whether movsb increments (DF=0) or decrements (DF=1) EDI because you overwrite EDI before the next use.


    repne scasw is 16-bit "word" size using AX, like it says in the HTML extracts of Intel's manual that you linked (https://www.felixcloutier.com/x86/scas:scasb:scasw:scasd). That's both the increment and the compare width.

    If you want overlapping dword compares of EAX, you can't use scasw.

    You could use scasd in a loop, but then you'd have to decrement edi to create overlap. So really you should just use a normal cmp [edi], eax and add edi, 2 if you only want to check even positions.

    (Or preferably use SSE2 SIMD pcmpeqd to implement memmem for a 4-byte search "needle". Look at an optimized implementation like glibc's for ideas, or a strstr implementation but take out the checks for a 0 terminator in the "haystack".)

    repne scasd does not implement strstr or memmem, it only searches for a single element. With byte operand size, it implements memchr.


    On 32 bit systems, it is impossible to force SCAS to "scan a string from the end to the start."

    rep scas doesn't operate on (implicit-length) C-style strings at all; it works on explicit-length strings. Therefore you can just point EDI at the last element of the buffer.

    Unlike strrchr you don't have to find the end of the string as well as the last match, you know / can calculate where the end of the string is. Perhaps calling them "strings" is the problem; the x86 rep-string instructions actually work on known-size buffers. That's why they take a count in ECX and don't also stop on a terminating 0 byte.

    Use lea edi, [buf + ecx - 1] to set up for std ; rep scasb. Or lea edi, [buf + ecx*2 - 2] to set up for backwards rep scasw on a buffer with ECX word elements. (Generate a pointer to the last element = buf + size - 1 = buf-1 + size)

    Any REP instruction always uses the ECX register as a counter and always decrements ECX regardless of the direction flag's value. This means it is impossible to "scan a string from the end to the beginning" using REP SCAS.

    This just makes zero sense. Of course it decrements; ECX=0 is how the search ends on no-match. If want to calculate position relative to the end after searching from the end, you can do length - ecx or something like that. Or do pointer-subtraction on EDI.

    6: not the data type of registers stored in EDI and ESI.

    Assembly language doesn't have types; that's a higher level concept. It's up to you to do the right thing to the right bytes in asm. EDI / ESI are registers; the pointers stored in them are just integers that have no type in asm. You don't "store a register in EDI", it is a register. Maybe you meant to say "pointer store in EDI"? Registers don't have types; a bit-pattern (aka integer) in a register can be signed 2's complement, unsigned, a pointer, or whatever other interpretation you want.

    But yes, any magic that MASM does based on how you defined a symbol is completely gone once you have a pointer in a register.

    Remember that movsd is just a 1-byte instruction in x86 machine code, just the opcode. It has only 3 inputs: DF, and two 32-bit integers in EDI and ESI, and they're all implicit (implied by the opcode byte). There's no other context that can affect what the hardware does. Every machine instruction has its documented effect on the architectural state of the machine; nothing more, nothing less.

    7: It is impossible to make MOVS transfer from the beginning to the end of a string. ... std

    No, std makes a transfer go backwards, from end to beginning. DF=0 is the normal / forward direction. Calling conventions guarantee / require that DF=0 on entry and exit from any function so you don't need a cld before using string instructions; you can just assume that DF=0. (And you should normally leave DF=0.)

    8: It is impossible to reverse a string's letters using MOVS since EDI and ESI are either both decremented or both incremented by MOVS.

    That's correct. And a lods / std / stos / cld loop is not worth it vs. a normal loop that uses dec or sub on one of the pointers. You can use lods for the read part and manually write backwards. And you can go 4x faster by loading a dword and using bswap to reverse it in a register, so you're copying in chunks of 4 reversed bytes.

    Or for in-place reversal: 2 loads into tmp regs, then 2 stores, then moves the pointers towards each other until they cross. (Also works with bswap or movbe)


    Other weird inefficiencies in your code:

        mov eax, 0                ;; completely pointless, EAX is overwritten by next instruction
        mov eax, "omit"
    

    Also, lea with a disp32 addressing mode is a pointless waste of code-size. Only use LEA for static addresses in 64-bit code, for RIP-relative addressing. Use mov esi, OFFSET Input instead, like you're doing with push offset Input earlier.