Search code examples
c++arrayspointerspointer-arithmetic

Iterating over array of polymorphic objects


Consider the following code:

#include <iostream>

struct B {
    char i;
    B(char i) : i(i) {};
    void bar() {};
};

struct D : B {
    int y;
    D(char i, int y) : B(i), y(y) {};
};

void foo(B *arr, size_t size)
{
    for(B *end = arr + size; arr < end; ++arr) {
        std::cout << arr->i << std::endl;
    }
}

int main()
{
    D arr[3] = { {'a', 65}, {'b', 66}, {'c', 67} };
    foo(arr, sizeof(arr) / sizeof(*arr));
}

As expected, only a is printed. Well, a and two padding bytes that follow i in base class B.

Then imagine that we make B's member function bar virtual. That will make both classes polymorphic. In this configuration the program would output abc in both clang and gcc. Which means that they both calculate offsets based on polymorphic types and some runtime info. Which is nonsense, as far as I am concerned.

I also tried to add another derived class with different layout:

struct C : B {
    long y;
    C(char i, long y) : B(i), y(y) {};
};
//...
C rra[] = { {'a', 65}, {'b', 66}, {'c', 67} };
foo(rra, sizeof(rra) / sizeof(*rra));

In my case the output was a bizarre apB, which is not what I initialized those with. So it seems that no runtime info is used to calculate the offsets.

So, my question is pretty my staightforward:

  1. When iterating over an array of derived via a pointer to a base in a polymorphic context, which offset will be used according to the standard?

I sifted through the standard and found no mention of runtime info affecting the offsets.

[expr.add] doesn't really clarify the matter. Techically it says that the resulting pointer shall point at the element of an array.

It becomes really odd for me when I make the foo print:

#include <iostream>

struct B {
    char i;
    B(char i) : i(i) {};
    virtual void foo() { std::cout << "I AM BASE" << i << std::endl; };
};

struct D : B {
    int y;
    D(char i, int y) : B(i), y(y) {};
    virtual void foo() { std::cout << "I AM DERIVED" << i << std::endl; };
};

struct C : B {
    long y;
    C(char i, long y) : B(i), y(y) {};
    virtual void foo() { std::cout << "I AM CERIVED" << i << std::endl; };

};

void foo(B *arr, size_t size)
{
    for(B *end = arr + size; arr < end; ++arr) {
        std::cout << arr->i << std::endl;
        arr->foo();
    }
}

int main()
{
    D arr[] = { {'a', 65}, {'d', 66}, {'c', 67} };
    foo(arr, sizeof(arr) / sizeof(*arr));
    C rra[] = { {'a', 70}, {'d', 66}, {'c', 67} };
    foo(rra, sizeof(rra) / sizeof(*rra));
}

It prints I AM DERIVED with correct char for the first iteration, and only one CERIVED for the second one, then fails with SIGSEGV.

I can reproduce it with both latest and gcc. The link to godbolt.

EDIT Technically, the question is answered. If anybody is interested in why iterating over array of D via pointer to B with virtual functions worked (which was an initial concern of mine), here is my suggestion - with virtual functions the vtable pointer is added to the structs, which forces the structs to align on 8 byte boundary. This bloats B and D to 16 bytes, and C to 24 (in platform used in the examples). So iterating array of D via pointer to B works only because the class sizes are the same. But this is UB nonetheless.


Solution

  • Your foo() loop exhibits undefined behavior in your example.

    It is expecting a pointer to (the 1st element of) an array of B objects, and thus can iterate only B objects, eg:

      B   B   B
    +---+---+---+
    | i | i | i |
    +---+---+---+
    ^   ^   ^   ^
    arr |   |   |
      arr+1 |   |
          arr+2 |
              arr+3
    

    But, you are passing in pointers to arrays of C and D objects instead, where sizeof(C) > sizeof(B) and sizeof(D) > sizeof(B), so the pointer arithmetic used by the iteration will be off, which is why you are getting weird results, ie:

      C    C    C
    +----+----+----+
    | iy | iy | iy |
    +----+----+----+
    ^   ^   ^   ^
    arr |   |   |
      arr+1 |   |
          arr+2 |
              arr+3
    

    Pointer arithmetic advances by the size of the type which the pointer itself is declared as (in this case, it advances by exactly sizeof(B) bytes). It does not advance by the size of the object that is being pointed at (ie, C or D), as you are expecting.

    To do what you are attempting, you must use a virtual method to handle the polymorphic printout, AND you must pass in a pointer to an array of B* pointers to objects, not a pointer to an array of objects, eg:

      C    C    C
    +----+----+----+
    | iy | iy | iy |
    +----+----+----+
      ^    ^    ^
      |    |    |
    +----+----+----+
    | B* | B* | B* |
    +----+----+----+
    ^    ^    ^    ^
    arr  |    |    |
       arr+1  |    |
            arr+2  |
                 arr+3
    
    #include <iostream>
    #include <iterator>
    
    struct B {
        char i;
        B(char i) : i(i) {};
        virtual void foo() { std::cout << "I AM BASE: " << i << std::endl; };
    };
    
    struct D : B {
        int y;
        D(char i, int y) : B(i), y(y) {};
        void foo() override { std::cout << "I AM DERIVED: " << i << std::endl; };
    };
    
    struct C : B {
        long y;
        C(char i, long y) : B(i), y(y) {};
        void foo() override { std::cout << "I AM CERIVED: " << i << std::endl; };
    };
    
    void foo(B* arr[], size_t size)
    {
        for(size_t i = 0; i < size; ++i) {
            arr[i]->foo();
        }
    }
    
    int main()
    {
        D arr[] = { {'a', 65}, {'d', 66}, {'c', 67} };
        B* arr_d[] = { &arr[0], &arr[1], &arr[2] };
        foo(arr_d, std::size(arr_d));
    
        C rra[] = { {'a', 70}, {'d', 66}, {'c', 67} };
        B* arr_c[] = { &rra[0], &rra[1], &rra[2] };
        foo(arr_c, std::size(arr_c));
    }
    

    Online Demo