x86 operating-system memory-segmentation

How are segment registers unused in protected mode memory addressing in modern x86 systems?

I understand how segmentation works, and that paging is the preferred way for memory access in modern operating systems. But I am not sure about the way the segment registers are unused:

They just APPEAR unused, because they generally have base 0 and limit 0xFFFFFFFF. Note that in this case, they are still involved in physical address calculation, but are transparent and provide a flat memory model.
They are untouched at all.

Solution

A funny combination of them, perhaps. What happens from a high level (if it can be called high) perspective is that most segments are configured with a base of 0 and a limit of 0xFFFFFFFF (fs and gs may be used for special purposes though).

But configuring a segment with a non-zero base may have performance consequences. For example on AMD K8 and K10, configuring the code segment to have a non-zero base increases the latency of branch mispredictions by two cycles, and a general address costs a cycle longer to compute if a segment with a non-zero base is involved. This may mean that the processor has a special fast-path for segments with a base of zero, so that the base does not participate in the calculation of the address at all rather than adding zero (which would still take time).

I could find no reference to this effect existing on any other µarchs, but it may not be fully explored because it is a relatively rare effect, especially in performance-sensitive code. In a quick test, a similar effect seems to exist on Haswell, with this code (skips some trivial set-up):

.loop:
    mov rax, [rsp+rax]
    add ecx, 1
    jnz .loop

Running two cycles per iteration faster (5 cycles/iteration) than this code (7 cycles/iteration):

.loop:
    mov rax, [gs:rax]
    add ecx, 1
    jnz .loop

Possibly that means that more Intel µarchs are effected as well, though perhaps this is inaccurate since no segment is involved in the first code at all (since it's 64bit code) and perhaps that is what mattered.