Search code examples
c++allocationrestrict-qualifier

Restricted access for allocated arrays belonging to separate object


I was thinking about the utility of non-standard __restrict keyword in C and C++ and how its effect can be emulated by carefully declare (disjoint) value objects .

Restrict is usually explained through indirect memory access that might or might not overlap memory addresses:

int foo(Pointer1 a, Pointer2 b) {  // adding non-standard `restrict` keyword might hint that dreaded_function_call is never called
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function_call();  // in isolation this function may or may not be called
}

If the compiler can prove that a and b do not overlap then the dreaded_function_call() is never called or referred in the compilation.

This is exactly what I achieve in this example, with GCC at least, dreaded_function_call doesn't even appear in the generated machine code.

#include<vector>

template<class It> void modify_v(It);

void dreaded_function_call();

template<class It1, class It2>
void foo(It1 a, It2 b) {
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function_call();
}

int main() {
    std::vector<int> v1(10); modify_v(v1.begin());
    std::vector<int> v2(10); modify_v(v2.begin());

    foo(v1.begin() + 5, v2.begin() + 5);
}

However, if I slightly change the code to generate the vector themselves from separate function calls, I loose this optimization. I can observe this since the generated code will branch and still consider the possibly of calling dreaded_function_call.

std::vector<int> make_v();
...
int main() {
    std::vector<int> v1 = make_v();
    std::vector<int> v2 = make_v();

    foo(v1.begin() + 5, v2.begin() + 5);
}

What is going on here? make_v returns by copy, so v1 and v2 should be disjoint just like in the case above. Yet the compiler doesn't do the same optimization.

Is this just missed opportunity for optimization, or this optimization would be just outright invalid?

Here is the compiled code that illustrates both cases with GCC: https://godbolt.org/z/WKKzs1edc

(Clang gives the same behavior.)


EDIT: As one of the comments points out, the culprit might be the fact that the compiler cannot see how the vectors are allocated since make_v is hidden from view, even if make_v returns a copy. (Also copy elision could be interfering here?)

In this case, the following code could be more clear with respect to whether the copy is visible or not,

std::vector<int> make_v();
...
int main() {
    std::vector<int> v0 = make_v();

    std::vector<int> v1 = v0;
    std::vector<int> v2 = v0;

    foo(v1.begin() + 5, v2.begin() + 5);
}

Indeed, in this case GCC doesn't still optimize (doesn't effectively restricts pointer) but Clang does. As illustrated here, https://godbolt.org/z/4xhq1975x

So there is a combination of factors here, the last factor being the compiler itself. But in my opinion, it seems that return-by-copy and the analysis of the copy constructor should be enough.

If this is a consequence of return by move or RVO, then I would say that this is a hidden cost of these features. Just my opinion.


Solution

  • Each std::vector<int> holds a pointer that points to the actual memory region holding the elements.

    We know from the library specification in the standard, that it is impossible for two std::vector<int> objects to refer to the same element objects or the same storage.

    However, from the actual core language point-of-view, there is nothing preventing you from writing a make_v function that always returns std::vector<int> objects that have their internal pointer initialized to the same value. In particular, even if there is no public member function to achieve this, it is always possible to use tricks to access private data members directly. Class invariants aren't actually usable by the optimizer, even if it was clever enough to understand them.

    Of course, any way that you could attempt to do this would have undefined behavior under the library rules, but the compilers (with some smaller exceptions) do not know or take into account the library specification and instead work with the standard library code just like any other code.

    In your first example the compiler can see through inlining to the actual operator new calls used to allocate memory for the elements. And two calls to operator new must always produce pointers to disjoint memory without intervening operator delete call to the first pointer value. That's knowledge that is built-in to the compiler's optimizer.

    No pointer or reference that could be used to access the std::vector<int> objects is given to modify_v and no such pointers escape the function, so the compiler also knows that modify_v can't modify where the vector objects' internal pointers point, nor can modify_v call operator delete on the pointer.