I understand how segmentation works, and that paging is the preferred way for memory access in modern operating systems. But I am not sure about the way the segment registers are unused:
A funny combination of them, perhaps. What happens from a high level (if it can be called high) perspective is that most segments are configured with a base of 0 and a limit of 0xFFFFFFFF (fs
and gs
may be used for special purposes though).
But configuring a segment with a non-zero base may have performance consequences. For example on AMD K8 and K10, configuring the code segment to have a non-zero base increases the latency of branch mispredictions by two cycles, and a general address costs a cycle longer to compute if a segment with a non-zero base is involved. This may mean that the processor has a special fast-path for segments with a base of zero, so that the base does not participate in the calculation of the address at all rather than adding zero (which would still take time).
I could find no reference to this effect existing on any other µarchs, but it may not be fully explored because it is a relatively rare effect, especially in performance-sensitive code. In a quick test, a similar effect seems to exist on Haswell, with this code (skips some trivial set-up):
.loop:
mov rax, [rsp+rax]
add ecx, 1
jnz .loop
Running two cycles per iteration faster (5 cycles/iteration) than this code (7 cycles/iteration):
.loop:
mov rax, [gs:rax]
add ecx, 1
jnz .loop
Possibly that means that more Intel µarchs are effected as well, though perhaps this is inaccurate since no segment is involved in the first code at all (since it's 64bit code) and perhaps that is what mattered.