Search code examples
assemblyx86repeatmachine-codeinstructions

why do repe and repne do the same before movsb?


I have an assembly test soon, and while preparing, I noticed something strange.
repe movsb was repeating while ZF=0, and I was teached that repe should repeat while CX not equal to zero and while ZF=1.
I did some testing and discovered that before movsb, the rep, repe, and repne instructions work the same.
What is the explanation for this?

edit: here is the code:

.model small
    .data
    A db   '   This     is    a    test '
    N  db  27
    .stack 10h
    .code
    mov ax,@data
    mov ds,ax
    mov es,ax
    cld
    mov al,' '
    mov cl,N
    xor ch,ch
    mov di,offset  A
    next:  repe scasb
    jcxz cont        ; jump if cx=0
    dec di
    inc cx
    xchg  si,di      ; swap between si and di
    push  cx
    push  di
    repe  movsb
    pop   di
    pop   cx
    repne scasb
    mov si,di
    jmp next
    cont: .exit
    end

Solution

  • In the machine code, there are actually only two different prefix bytes.

    • 0xF3 is called REP when used with MOVS/LODS/STOS/INS/OUTS (instructions which don't affect flags)
    • 0xF3 is called REPE or REPZ when used with CMPS/SCAS
    • 0xF2 is called REPNE or REPNZ when used with CMPS/SCAS, and is not documented for other instructions.

    Intel's insn reference manual REP entry only documents F3 REP for MOVS, not the F2 prefix. Congratulations, you've found an undocumented encoding for REP MOVSB, at least on whatever CPU you tested this on. :)

    See also this appendix of the NASM manual which includes other undocumented opcodes, but not this F2 A4 REPNE MOVSB. (linked from the tag wiki).


    Normally, prefixes which don't affect an instruction are ignored, so I would have expected REPNE MOVSB to run identically to just MOVSB. e.g. TZCNT is encoded as REP BSF, and on CPUs which don't support BMI1, it simple executes as BSF. (Doing the same thing except when the source is zero.)

    Similarly, REP RET is a common trick to introduce padding to work around a limitation of AMD K8/K10 branch predictors, and runs the same as RET.

    But Intel cautions that this behaviour is not guaranteed, because new instructions can use an encoding that used to be a different instruction with an ignored prefix. e.g. LZCNT (encoded as REP BSR) produces the opposite result to BSR, so old code that included a REP BSR for some reason would stop working on new CPUs.

    Note that on original 8086, rep mul/imul negates the result!! So historically it hasn't always been completely ignored, and that's probably why Intel only ever documents the ignoring for specific cases when the backwards-compatibility is actually useful (like rep nop = pause, stuff like HLE and BND prefixes, as well as TZCNT = BSF for non-zero inputs.) See also my and other answers on a retrocomputing Q&A.