Search code examples
c++c++11castingundefined-behaviordynamic-cast

What is the meaning of `*dynamic_cast<T*>(...)`?


Recently I was looking in the code of an open source project, and I saw a bunch of statements of the form T & object = *dynamic_cast<T*>(ptr);.

(Actually this was occuring in macro used to declare many functions following a similar pattern.)

To me this looked like a code-smell. My reasoning was, if you know the cast will succeed, then why not use a static_cast? If you aren't sure, then shouldn't you use an assert to test? Since the compiler can assume that any pointer that you * is not null.

I asked one of the devs on irc about it, and he said that, he considers static_cast downcast to be unsafe. They could add an assert, but even if they don't, he says you will still get a null pointer dereference and crash when obj is actually used. (Because, on failure, the dynamic_cast will convert the pointer to null, then when you access any member, you will be reading from some address of value very close to zero, which the OS won't allow.) If you use a static_cast, and it goes bad, you might just get some memory corruption. So by using the *dynamic_cast option, you are trading off speed for slightly better debuggability. You aren't paying for the assert, instead you are basically relying on the OS to catch the nullptr dereference, at least that's what I understood.

I accepted that explanation at the time, but it bothered me and I thought about it some more.

Here's my reasoning.

If I understand the standard right, a static_cast pointer cast basically means to do some fixed pointer arithmetic. That is, if I have A * a, and I static cast it to a related type B *, what the compiler is actually going to do with that is add some offset to the pointer, the offset depending only on the layout of the types A, B, (and which C++ implementation potentially). This theory can be tested by static casting pointers to void * and outputting them, before and after the static cast. I expect that if you look at the generated assembly, the static_cast will turn into "add some fixed constant to the register corresponding to the pointer."

A dynamic_cast pointer cast means, first check the RTTI and only do the static cast if it is valid based on the dynamic type. If it is not, then return nullptr. So, I'd expect that the compiler will at some point expand an expresion dynamic_cast<B*>(ptr) where ptr is of type A* into an expression like

(__validate_dynamic_cast_A_to_B(ptr) ? static_cast<B*>(ptr) : nullptr)

However, if we then * the result of the dynamic_cast, * of nullptr is UB, so we are implicitly promising that the nullptr branch never happens. And conforming compilers are permitted to "reason backwards" from that and eliminate null checks, a point driven home in Chris Lattner's famous blog post.

If the test function __validate_dynamic_cast_A_to_B(ptr) is opaque to the optimizer, i.e. it might have side effects, then the optimizer can't get rid of it, even if it "knows" the nullptr branch doesn't happen. However, probably this function is not opaque to the optimizer -- probably it has a very good understanding of its possible side effects.

So, my expectation is that the optimizer will essentially convert *dynamic_cast<T*>(ptr) into *static_cast<T*>(ptr), and that interchanging these should give the same generated assembly.

If true, that would justify my original argument that *dynamic_cast<T*> is a code smell, even if you don't really care about UB in your code and only care about what "actually" happens. Because, if a conforming compiler would be permitted to change it to a static_cast silently, then you aren't getting any safety that you think you are, so you should either explicitly static_cast or explicitly assert. At least, that would be my vote in a code review. I'm trying to figure out if that argument is actually right.


Here is what the standard says about dynamic_cast:

[5.2.7] Dynamic Cast [expr.dynamic.cast]
1. The result of the expression dynamic_cast<T>(v) is the result of converting the expression v to type T. T shall be a pointer or reference to a complete class type, or "pointer to cv void." The dynamic_cast operator shall not cast away constness.
...
8. If C is the class type to which T points or refers, the run-time check logically executes as follows:
(8.1) - If, in the most derived object pointed (referred) to by v, v points (refers) to a public base class subobject of a C object, and if only one object of type C is derived from the subobject pointed (referred) to by v the result points (refers) to that C object.
(8.2) - Otherwise, if v points (refers) to a public base class subobject of the most derived object, and the type of the most derived object has a base class, of type C, that is unambiguous and public, the result points (refers) to the C subobject of the most derived object.
(8.3) - Otherwise, the run-time check fails.

Assuming that the hierarchy of classes is known at compile-time, the relative offsets of each of these classes within eachothers layouts are also known. If v is a pointer to type A, and we want to cast it to a pointer of type B, and the cast is unambiguous, then the shift that v must take is a compile-time constant. Even if v actually points to an object of a more derived type C, that fact doesn't change where the A subobject lies relative to the B subobject, right? So no matter what the type C is, even if it is some unknown type from another compilation unit, to my knowledge the result of a dynamic_cast<T*>(ptr) has only two possible values, nullptr or "fixed-offset from ptr".


However, the plot thickens somewhat upon actually looking at some code gen.

Here's a simple program that I made to investigate this:


int output = 0;

struct A {
  explicit A(int n) : num_(n) {}
  int num_;

  virtual void foo() {
    output += num_;
  }
};

struct B final : public A {
  explicit B(int n) : A(n), num2_(2 * n) {}

  int num2_;

  virtual void foo() override {
    output -= num2_;
  }
};

void visit(A * ptr) {
  B & b = *dynamic_cast<B*>(ptr);
  b.foo();
  b.foo();
}

int main() {
  A * ptr = new B(5); 

  visit(ptr);

  ptr = new A(10);
  visit(ptr);

  return output;
}

According to godbolt compiler explorer, gcc 5.3 x86 assembly for this, with options -O3 -std=c++11, looks like this:


A::foo():
        movl    8(%rdi), %eax
        addl    %eax, output(%rip)
        ret
B::foo():
        movl    12(%rdi), %eax
        subl    %eax, output(%rip)
        ret
visit(A*):
        testq   %rdi, %rdi
        je      .L4
        subq    $8, %rsp
        xorl    %ecx, %ecx
        movl    typeinfo for B, %edx
        movl    typeinfo for A, %esi
        call    __dynamic_cast
        movl    12(%rax), %eax
        addl    %eax, %eax
        subl    %eax, output(%rip)
        addq    $8, %rsp
        ret
.L4:
        movl    12, %eax
        ud2
main:
        subq    $8, %rsp
        movl    $16, %edi
        call    operator new(unsigned long)
        movq    %rax, %rdi
        movl    $5, 8(%rax)
        movq    vtable for B+16, (%rax)
        movl    $10, 12(%rax)
        call    visit(A*)
        movl    $16, %edi
        call    operator new(unsigned long)
        movq    vtable for A+16, (%rax)
        movl    $10, 8(%rax)
        movq    %rax, %rdi
        call    visit(A*)
        movl    output(%rip), %eax
        addq    $8, %rsp
        ret
typeinfo name for A:
typeinfo for A:
typeinfo name for B:
typeinfo for B:
vtable for A:
vtable for B:
output:
        .zero   4

When I change the dynamic_cast to a static_cast, I get the following instead:


A::foo():
        movl    8(%rdi), %eax
        addl    %eax, output(%rip)
        ret
B::foo():
        movl    12(%rdi), %eax
        subl    %eax, output(%rip)
        ret
visit(A*):
        movl    12(%rdi), %eax
        addl    %eax, %eax
        subl    %eax, output(%rip)
        ret
main:
        subq    $8, %rsp
        movl    $16, %edi
        call    operator new(unsigned long)
        movl    $16, %edi
        subl    $20, output(%rip)
        call    operator new(unsigned long)
        movl    12(%rax), %edx
        movl    output(%rip), %eax
        subl    %edx, %eax
        subl    %edx, %eax
        movl    %eax, output(%rip)
        addq    $8, %rsp
        ret
output:
        .zero   4

Here's the same with clang 3.8 and same options.

dynamic_cast:


visit(A*):                            # @visit(A*)
        xorl    %eax, %eax
        testq   %rdi, %rdi
        je      .LBB0_2
        pushq   %rax
        movl    typeinfo for A, %esi
        movl    typeinfo for B, %edx
        xorl    %ecx, %ecx
        callq   __dynamic_cast
        addq    $8, %rsp
.LBB0_2:
        movl    output(%rip), %ecx
        subl    12(%rax), %ecx
        movl    %ecx, output(%rip)
        subl    12(%rax), %ecx
        movl    %ecx, output(%rip)
        retq

B::foo():                            # @B::foo()
        movl    12(%rdi), %eax
        subl    %eax, output(%rip)
        retq

main:                                   # @main
        pushq   %rbx
        movl    $16, %edi
        callq   operator new(unsigned long)
        movl    $5, 8(%rax)
        movq    vtable for B+16, (%rax)
        movl    $10, 12(%rax)
        movl    typeinfo for A, %esi
        movl    typeinfo for B, %edx
        xorl    %ecx, %ecx
        movq    %rax, %rdi
        callq   __dynamic_cast
        movl    output(%rip), %ebx
        subl    12(%rax), %ebx
        movl    %ebx, output(%rip)
        subl    12(%rax), %ebx
        movl    %ebx, output(%rip)
        movl    $16, %edi
        callq   operator new(unsigned long)
        movq    vtable for A+16, (%rax)
        movl    $10, 8(%rax)
        movl    typeinfo for A, %esi
        movl    typeinfo for B, %edx
        xorl    %ecx, %ecx
        movq    %rax, %rdi
        callq   __dynamic_cast
        subl    12(%rax), %ebx
        movl    %ebx, output(%rip)
        subl    12(%rax), %ebx
        movl    %ebx, output(%rip)
        movl    %ebx, %eax
        popq    %rbx
        retq

A::foo():                            # @A::foo()
        movl    8(%rdi), %eax
        addl    %eax, output(%rip)
        retq

output:
        .long   0                       # 0x0

typeinfo name for A:

typeinfo for A:

typeinfo name for B:

typeinfo for B:

vtable for B:

vtable for A:

static_cast:


visit(A*):                            # @visit(A*)
        movl    output(%rip), %eax
        subl    12(%rdi), %eax
        movl    %eax, output(%rip)
        subl    12(%rdi), %eax
        movl    %eax, output(%rip)
        retq

main:                                   # @main
        retq

output:
        .long   0                       # 0x0

So, in both cases, it seems that dynamic_cast cannot be eliminated by the optimizer:

It seems to generate calls to a mysterious __dynamic_cast function, using the typeinfo of both classes, no matter what. Even if all optimizations are on, and B is marked final.

  • Does this low-level call have side effects that I didn't consider? My understanding was that the vtables are essentially fixed and that the vptr in an object doesn't change... am I right? I have only basic familiarity with how vtables are actually implemented and tbh I usually avoid virtual functions in my code, so I haven't really thought deeply on it or accumulated experience.

  • Am I right that a conforming compiler could replace *dynamic_cast<T*>(ptr) with *static_cast<T*>(ptr) as a valid optimization?

  • Is it true that "usually" (meaning, on x86 machines, let's say, and casting between classes in a hierarchy of "usual" complexity) a dynamic_cast cannot be optimized away, and will in fact produce a nullptr even if you * it right after, leading to nullptr dereference and crash upon accessing the object?

  • Is "always replace *dynamic_cast<T*>(ptr) with either dynamic_cast + test or assertion of some kind, or with *static_cast<T*>(ptr)" a sound advice?


Solution

  • T& object = *dynamic_cast<T*>(ptr); is broken because it invokes UB on failure, period. I see no need to belabor the point. Even if it seems to work on current compilers, it may not work on later versions with more aggressive optimizers.

    If you want checks and don't want to be bothered writing an assertion, use the reference form that throws bad_cast on failure:

    T& object = dynamic_cast<T&>(*ptr);
    

    dynamic_cast isn't just a run-time check. It can do things static_cast can't. For example, it can cast sideways.

    A   A (*)
    |   |
    B   C
    \   /
     \ /
      D
    

    If the actual most derived object is a D, and you have a pointer to the A base marked with *, you can actually dynamic_cast it to get a pointer to the B subobject:

    struct A { virtual ~A() = default; };
    struct B : A {};
    struct C : A {};
    struct D : B, C {};
    void f() {
        D d;
        C& c = d;
        A& a = c;
        assert(dynamic_cast<B*>(&a) != nullptr);
    }
    

    Note that a static_cast here would be completely wrong.

    (Another prominent example where dynamic_cast can do something static_cast can't is when you are casting from a virtual base to a derived class.)

    In a world without final or whole-program knowledge, you have to do the check at run time (because C and D may not be visible to you). With final on B, you should be able to get away with not doing it, but I'm not surprised if compilers haven't gotten around to optimizing that case yet.