Search code examples
c++operating-systemundefined-behaviormemory-address

Is it undefined behavior to have two pointers with different values referring to the same object?


Note: if after reading this question you think, "how can that even happen", that is ok. If you want to keep an open mind, there are some points after the question that you can follow and that show how this can happen and why this is useful. Just remember that this is just a question and not a tutorial on any of these topics. The comments have enough noise already and they are hard to follow. If you have questions about these topics, I would appreciate if you post them as questions in SO instead of in the comments.



Question: If I have an object of type int stored at the address pointed by c

int* c = /* allocate int (returns unique address) */;
*c = 3;

referred by two pointers a and b:

int* a = /* create pointer to (*c) */;
int* b = /* create pointer to (*c) */;

such that:

assert(a != b);  // the pointers point to a different address
assert(*b == 3);
*a = 2;
assert(*b == 2);  // but they refer to the same value

Is this undefined behavior? If yes, which part of the C++ standard disallows this? If not, which parts of the C++ standard allows this?

Note: the memory c points to is allocated with a memory allocation function that returns an unique address (new, malloc, ...). The way to create these pointers with different values is very platform specific, although in most unix systems it can be done with mmap and on windows it can be done with VirtualAlloc.



Background: most operating systems (those that have a userspace that is not on ring 0) run their processes on virtual memory, and have a map from virtual memory pages to physical memory pages. Some of these systems (Linux/MacOS/BSDs/Unixes and 64bit windows) provide some system calls (like mmap or VirtualAlloc) that can be used to map two virtual memory pages to the same physical memory page. When a process performs this, it can essentially access the same page of physical memory from two different virtual memory addresses. That is, those two pointers will have a different value, but they will access the same physical memory storage. Keywords to google for: mmap, virtual memory, memory pages. Data-structures that use this feature for profit are "magic ring buffer"s (that's the technical term), and non-reallocating dynamically-sized vectors (that is, vectors that do not need to reallocate memory when they grow). Google provides more information about these than I could ever fit here.

Very minimal probably non-working example (unix only):

We first allocate an int on the heap. The following request an anonymous, non-file-backed, mapping of virtual memory. One must request here at least a whole memory page, but for simplicity I'll just request the size of an int (mmap will allocate a whole memory page anyways):

int* c= mmap(NULL, sizeof(int), PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE,-1, 0);

Now we need to map this to two independent memory locations, so we map it to the same memory-mapped file, twice, to, e.g., two adjacent memory locations. We won't really use this file, but we still need to create it and open it:

mmap(c, sizeof(int), PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, some_fd, 0);
mmap(c + 1, sizeof(int), PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, some_fd, 0);

Now we are almost done:

int* a = c;
int* b = c + 1;

These are obviously different virtual addresses:

assert(a != b);

But they point to the same, non-file-backed, physical memory page:

*a = 314;
assert(*b == 314);

So there you go. Using VirtualAlloc the same can be done on Windows, but the API is a bit different.


Solution

  • First lets look at what the standard has to say about an object

    [intro.object]

    The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is a region of storage. [ Note: A function is not an object, regardless of whether or not it occupies storage in the way that objects do. —end note ] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed. The properties of an object are determined when the object is created. An object can have a name (Clause 3). An object has a storage duration (3.7) which influences its lifetime (3.8). An object has a type (3.9). The term object type refers to the type with which the object is created. Some objects are polymorphic (10.3); the implementation generates information associated with each such object that makes it possible to determine that object’s type during program execution. For other objects, the interpretation of the values found therein is determined by the type of the expressions (Clause 5) used to access them.

    And then we have

    Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses.

    So we know that an object has an address and it is the first byte of the storage it uses. If we look at what a byte is we have

    [intro.memory]

    The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementationdefined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

    Emphasis mine

    So if we have a pointer to an object the pointer is going to hold a unique value(address). If we have another pointer to that same object then it will also have to have that same value(address). Undefined behavior does not even enter the equation as you simply cannot have two pointers to the same object that have different values.