Search code examples
assemblygccclangx86-64addressing-mode

Why do GCC and Clang stop using RIP relative loads for arrays bigger than 16MB?


My understanding is that RIP relative addressing should work for offsets up to 2GB in size, but for some reason GCC (14.2) and Clang (19.1.0) stop using it when grabbing values more than 16MB away.

Given this code:

const int size = 1 << 22;

int small_array[size];

// 6 byte mov
int load_small_arry() {
    return small_array[(sizeof(small_array)/sizeof(int)-1)];
}

int big_array[size + 1];

// 5 byte mov + 6 byte mov in clang
// 9 byte mov in gcc
int load_big_arry() {
    return big_array[(sizeof(big_array)/sizeof(int)-1)];
}

I get this assembly from GCC (see clang results in godbolt link, different but still switches away from rip relative):

load_small_arry():
 mov    eax,DWORD PTR [rip+0x0]        # 6 <load_small_arry()+0x6>
    R_X86_64_PC32 small_array+0xfffff8
 ret
 nop    WORD PTR [rax+rax*1+0x0]
load_big_arry():
 movabs eax,ds:0x0
    R_X86_64_64 big_array+0x1000000
 ret

This is a larger encoding so I'm not sure why it would be preferred.

Godbolt link


Solution

  • The relevant code in GCC is here. It seems it's not really specific to RIP-relative addressing. The more general rule is that GCC assumes a value of the form static_label + constant_offset is encodable as a signed 32-bit immediate only when constant_offset < 16MB. There's a comment:

    For CM_SMALL assume that latest object is 16MB before end of 31bits boundary.

    It looks like the idea is that they want to support the use of pointers like static_label + constant_offset even when the result exceeds the 2 GB limit. In the small code model, static_label is known to be within that limit, and they assume further that it's at least 16 MB from the end. But if constant_offset is larger than 16 MB, they no longer trust that the result will fit in a signed 32-bit immediate, and fall back to code that doesn't need it to.

    This situation couldn't arise in well-defined ISO C or C++ code, because you're only allowed to do pointer arithmetic within a single array, and if the array is static, then all of it fits within 2 GB. But maybe they want to support it anyway as an extension of sorts, for compatibility with some code that needs it, or maybe it's permitted in one of the other languages that GCC supports (remember this is in the backend which is used by all the language front-ends).