How did Pentium III CPUs handle multiple instruction prefixes from the same group?

The Intel x86 specification states that using more than one instruction prefix from the same group results in undefined behavior. In practice, how did Pentium III Coppermine CPUs react in that situation? Sadly I don't have a chip to test.

Solution

Although you already know this, I'll start by stating it for clarity. x86 instructions can have up to 4 prefixes (each from a different group) that alter the processor's interpretation of the instruction. From the Intel IA-32 Architecture Manual, Volume 2A, Section 2.1:

2.1 INSTRUCTION FORMAT FOR PROTECTED MODE, REAL-ADDRESS MODE, AND VIRTUAL-8086 MODE

The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown in Figure 2-1. Instructions consist of optional instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if required).

Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format

2.1.1 Instruction Prefixes

Instruction prefixes are divided into four groups, each with a set of allowable prefix codes. For each instruction, it is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4 may be placed in any order relative to each other.

Group 1

Lock and repeat prefixes:

LOCK prefix is encoded using F0H.

REPNE/REPNZ prefix is encoded using F2H. Repeat-Not-Zero prefix applies only to string and input/output instructions. (F2H is also used as a mandatory prefix for some instructions.)

REP or REPE/REPZ is encoded using F3H. The repeat prefix applies only to string and input/output instructions. F3H is also used as a mandatory prefix for POPCNT, LZCNT and ADOX instructions.

Bound prefix is encoded using F2H if the following conditions are true:

CPUID.(EAX=07H, ECX=0):EBX.MPX[bit 14] is set.

BNDCFGU.EN and/or IA32_BNDCFGS.EN is set.

When the F2 prefix precedes a near CALL, a near RET, a near JMP, or a near Jcc instruction (see Chapter 17, “Intel® MPX,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1).

Group 2

Segment override prefixes:

2EH—CS segment override (use with any branch instruction is reserved).

36H—SS segment override prefix (use with any branch instruction is reserved).

3EH—DS segment override prefix (use with any branch instruction is reserved).

26H—ES segment override prefix (use with any branch instruction is reserved).

64H—FS segment override prefix (use with any branch instruction is reserved).

65H—GS segment override prefix (use with any branch instruction is reserved).

Branch hints _{^{(no longer used; reserved)}}:

2EH—Branch not taken (used only with Jcc instructions).

3EH—Branch taken (used only with Jcc instructions).

Group 3

Operand-size override prefix is encoded using 66H (66H is also used as a mandatory prefix for some instructions).

Group 4

67H—Address-size override prefix.

The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor environment. See “LOCK—Assert LOCK# Signal Prefix” in Chapter 3, “Instruction Set Reference, A-L,” for a description of this prefix.

Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string. Use these prefixes only with string and I/O instructions (MOVS, CMPS, SCAS, LODS, STOS, INS, and OUTS). Use of repeat prefixes and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

Some instructions may use F2H,F3H as a mandatory prefix to express distinct functionality.

Branch hint prefixes (2EH, 3EH) allow a program to give a hint to the processor about the most likely code path for a branch. Use these prefixes only with conditional branch instructions (Jcc). Other use of branch hint prefixes and/or other undefined opcodes with Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

The operand-size override prefix allows a program to switch between 16- and 32-bit operand sizes. Either size can be the default; use of the prefix selects the non-default size.

Some SSE2/SSE3/SSSE3/SSE4 instructions and instructions using a three-byte sequence of primary opcode bytes may use 66H as a mandatory prefix to express distinct functionality.

Other use of the 66H prefix is reserved; such use may cause unpredictable behavior.

The address-size override prefix (67H) allows programs to switch between 16- and 32-bit addressing. Either size can be the default; the prefix selects the non-default size. Using this prefix and/or other undefined opcodes when operands for the instruction do not reside in memory is reserved; such use may cause unpredictable behavior.

Notice that it doesn't actually say that multiple instruction prefixes from the same group result in "undefined behavior." Rather, it just says that it is "only useful" to include up to one from each group. That leaves things pretty unspecified.

Looks to me like the only formal guarantees you get from the specification are that certain, specific combinations of instructions and prefixes can result in either "unpredictable behavior" or an exception, and that any single instruction longer than 15 bytes results in an "Invalid Opcode" exception.

That leaves us to test empirically multiple prefixes from each group on instructions where they are otherwise supported. To that end, and as requested, I ran the following tests on a Pentium III Coppermine¹:

Group 1: Various combinations of multiple REPE (F3) and REPNE (F2) prefixes on a CMPSB instruction (A6).

Only the last prefix that is encountered has an effect; other prefixes from the same group that precede it are ignored.

In fact, this appears to be standard behavior for all x86 processors, and is consistent with how Microsoft's disassembler shows the code. The leading (ignored) prefixes are not shown as being part of the instruction.
Group 2: Multiple segment override prefixes on a load (MOV) instruction.

Again, the very last prefix is the only one that matters. All others are ignored. And again, this seems to be standard for all x86 processors.

(I didn't bother to test branch-hint prefixes, either alone or in combination with segment-override prefixes, as these branch hints are ignored on all processors but the Pentium 4.)
Group 3: Multiple operand-size override prefixes (66h).

Repeated prefixes are ignored, so multiple 66h prefixes have exactly the same effect as one 66h prefix. They do not cancel each other out or any such thing.

Various sources online confirm that this is standard behavior for all x86 processors.
Group 4: Multiple address-size override prefixes (67h).

Same as Group 3: repeated prefixes are ignored.

In summary: In effect, all but the last prefix from a particular group are ignored. The very last prefix encountered on an instruction is the one that takes effect. All preceding redundant or meaningless prefixes are ignored. This appears to be true for all x86 processors, meaning that emulation code does not need to special-case this behavior for any particular generation/microarchitecture. However, prefixes that have no effect in one context might be re-purposed to have some meaning on future processors, so this is something to watch out for.

If possible, to save yourself the headache, you might consider offloading this interpretational work to your decoder. Specifically, one written by Intel, the Intel XED library (repository here on GitHub). You just give it anywhere from 1 to 15 bytes, and it returns the decoded opcode (including prefixes) and operands. Decoding is the hard part of x86, so this should save you a lot of headaches. It implements the same algorithm as described here—see, e.g., these notes and this code.

__
_{¹ Specifically, an Intel Pentium III EB @ 866 MHz (Family 6, Model 8, Stepping 6, Revision cC0). This is a Socket 370 FC-PGA chip, running on a Compaq Deskpro EN system with an Intel 815-based motherboard (133 MHz FSB). In case it matters (and it obviously shouldn't), the operating environment was Windows 2000 SP4. I used MASM and Visual Studio's debugger for testing.}