Recently I was looking in the code of an open source project, and I saw a bunch of statements of the form T & object = *dynamic_cast<T*>(ptr);
.
(Actually this was occuring in macro used to declare many functions following a similar pattern.)
To me this looked like a code-smell. My reasoning was, if you know the cast will succeed, then why not use a static_cast
? If you aren't sure, then shouldn't you use an assert to test? Since the compiler can assume that any pointer that you *
is not null.
I asked one of the devs on irc about it, and he said that, he considers static_cast
downcast to be unsafe. They could add an assert, but even if they don't, he says you will still get a null pointer dereference and crash when obj
is actually used. (Because, on failure, the dynamic_cast
will convert the pointer to null, then when you access any member, you will be reading from some address of value very close to zero, which the OS won't allow.) If you use a static_cast
, and it goes bad, you might just get some memory corruption. So by using the *dynamic_cast
option, you are trading off speed for slightly better debuggability. You aren't paying for the assert, instead you are basically relying on the OS to catch the nullptr dereference, at least that's what I understood.
I accepted that explanation at the time, but it bothered me and I thought about it some more.
Here's my reasoning.
If I understand the standard right, a static_cast
pointer cast basically means to do some fixed pointer arithmetic. That is, if I have A * a
, and I static cast it to a related type B *
, what the compiler is actually going to do with that is add some offset to the pointer, the offset depending only on the layout of the types A
, B
, (and which C++ implementation potentially). This theory can be tested by static casting pointers to void *
and outputting them, before and after the static cast. I expect that if you look at the generated assembly, the static_cast
will turn into "add some fixed constant to the register corresponding to the pointer."
A dynamic_cast
pointer cast means, first check the RTTI and only do the static cast if it is valid based on the dynamic type. If it is not, then return nullptr
. So, I'd expect that the compiler will at some point expand an expresion dynamic_cast<B*>(ptr)
where ptr
is of type A*
into an expression like
(__validate_dynamic_cast_A_to_B(ptr) ? static_cast<B*>(ptr) : nullptr)
However, if we then *
the result of the dynamic_cast, *
of nullptr
is UB, so we are implicitly promising that the nullptr
branch never happens. And conforming compilers are permitted to "reason backwards" from that and eliminate null checks, a point driven home in Chris Lattner's famous blog post.
If the test function __validate_dynamic_cast_A_to_B(ptr)
is opaque to the optimizer, i.e. it might have side effects, then the optimizer can't get rid of it, even if it "knows" the nullptr branch doesn't happen. However, probably this function is not opaque to the optimizer -- probably it has a very good understanding of its possible side effects.
So, my expectation is that the optimizer will essentially convert *dynamic_cast<T*>(ptr)
into *static_cast<T*>(ptr)
, and that interchanging these should give the same generated assembly.
If true, that would justify my original argument that *dynamic_cast<T*>
is a code smell, even if you don't really care about UB in your code and only care about what "actually" happens. Because, if a conforming compiler would be permitted to change it to a static_cast
silently, then you aren't getting any safety that you think you are, so you should either explicitly static_cast
or explicitly assert. At least, that would be my vote in a code review. I'm trying to figure out if that argument is actually right.
Here is what the standard says about dynamic_cast
:
[5.2.7]
Dynamic Cast[expr.dynamic.cast]
1. The result of the expressiondynamic_cast<T>(v)
is the result of converting the expressionv
to typeT
.T
shall be a pointer or reference to a complete class type, or "pointer to cv void." Thedynamic_cast
operator shall not cast away constness.
...
8. IfC
is the class type to whichT
points or refers, the run-time check logically executes as follows:
(8.1) - If, in the most derived object pointed (referred) to byv
,v
points (refers) to a public base class subobject of aC
object, and if only one object of typeC
is derived from the subobject pointed (referred) to byv
the result points (refers) to thatC
object.
(8.2) - Otherwise, ifv
points (refers) to a public base class subobject of the most derived object, and the type of the most derived object has a base class, of typeC
, that is unambiguous and public, the result points (refers) to theC
subobject of the most derived object.
(8.3) - Otherwise, the run-time check fails.
Assuming that the hierarchy of classes is known at compile-time, the relative offsets of each of these classes within eachothers layouts are also known. If v
is a pointer to type A
, and we want to cast it to a pointer of type B
, and the cast is unambiguous, then the shift that v
must take is a compile-time constant. Even if v
actually points to an object of a more derived type C
, that fact doesn't change where the A
subobject lies relative to the B
subobject, right? So no matter what the type C
is, even if it is some unknown type from another compilation unit, to my knowledge the result of a dynamic_cast<T*>(ptr)
has only two possible values, nullptr
or "fixed-offset from ptr
".
However, the plot thickens somewhat upon actually looking at some code gen.
Here's a simple program that I made to investigate this:
int output = 0;
struct A {
explicit A(int n) : num_(n) {}
int num_;
virtual void foo() {
output += num_;
}
};
struct B final : public A {
explicit B(int n) : A(n), num2_(2 * n) {}
int num2_;
virtual void foo() override {
output -= num2_;
}
};
void visit(A * ptr) {
B & b = *dynamic_cast<B*>(ptr);
b.foo();
b.foo();
}
int main() {
A * ptr = new B(5);
visit(ptr);
ptr = new A(10);
visit(ptr);
return output;
}
According to godbolt compiler explorer, gcc 5.3
x86 assembly for this, with options -O3 -std=c++11
, looks like this:
A::foo():
movl 8(%rdi), %eax
addl %eax, output(%rip)
ret
B::foo():
movl 12(%rdi), %eax
subl %eax, output(%rip)
ret
visit(A*):
testq %rdi, %rdi
je .L4
subq $8, %rsp
xorl %ecx, %ecx
movl typeinfo for B, %edx
movl typeinfo for A, %esi
call __dynamic_cast
movl 12(%rax), %eax
addl %eax, %eax
subl %eax, output(%rip)
addq $8, %rsp
ret
.L4:
movl 12, %eax
ud2
main:
subq $8, %rsp
movl $16, %edi
call operator new(unsigned long)
movq %rax, %rdi
movl $5, 8(%rax)
movq vtable for B+16, (%rax)
movl $10, 12(%rax)
call visit(A*)
movl $16, %edi
call operator new(unsigned long)
movq vtable for A+16, (%rax)
movl $10, 8(%rax)
movq %rax, %rdi
call visit(A*)
movl output(%rip), %eax
addq $8, %rsp
ret
typeinfo name for A:
typeinfo for A:
typeinfo name for B:
typeinfo for B:
vtable for A:
vtable for B:
output:
.zero 4
When I change the dynamic_cast
to a static_cast
, I get the following instead:
A::foo():
movl 8(%rdi), %eax
addl %eax, output(%rip)
ret
B::foo():
movl 12(%rdi), %eax
subl %eax, output(%rip)
ret
visit(A*):
movl 12(%rdi), %eax
addl %eax, %eax
subl %eax, output(%rip)
ret
main:
subq $8, %rsp
movl $16, %edi
call operator new(unsigned long)
movl $16, %edi
subl $20, output(%rip)
call operator new(unsigned long)
movl 12(%rax), %edx
movl output(%rip), %eax
subl %edx, %eax
subl %edx, %eax
movl %eax, output(%rip)
addq $8, %rsp
ret
output:
.zero 4
Here's the same with clang 3.8
and same options.
dynamic_cast
:
visit(A*): # @visit(A*)
xorl %eax, %eax
testq %rdi, %rdi
je .LBB0_2
pushq %rax
movl typeinfo for A, %esi
movl typeinfo for B, %edx
xorl %ecx, %ecx
callq __dynamic_cast
addq $8, %rsp
.LBB0_2:
movl output(%rip), %ecx
subl 12(%rax), %ecx
movl %ecx, output(%rip)
subl 12(%rax), %ecx
movl %ecx, output(%rip)
retq
B::foo(): # @B::foo()
movl 12(%rdi), %eax
subl %eax, output(%rip)
retq
main: # @main
pushq %rbx
movl $16, %edi
callq operator new(unsigned long)
movl $5, 8(%rax)
movq vtable for B+16, (%rax)
movl $10, 12(%rax)
movl typeinfo for A, %esi
movl typeinfo for B, %edx
xorl %ecx, %ecx
movq %rax, %rdi
callq __dynamic_cast
movl output(%rip), %ebx
subl 12(%rax), %ebx
movl %ebx, output(%rip)
subl 12(%rax), %ebx
movl %ebx, output(%rip)
movl $16, %edi
callq operator new(unsigned long)
movq vtable for A+16, (%rax)
movl $10, 8(%rax)
movl typeinfo for A, %esi
movl typeinfo for B, %edx
xorl %ecx, %ecx
movq %rax, %rdi
callq __dynamic_cast
subl 12(%rax), %ebx
movl %ebx, output(%rip)
subl 12(%rax), %ebx
movl %ebx, output(%rip)
movl %ebx, %eax
popq %rbx
retq
A::foo(): # @A::foo()
movl 8(%rdi), %eax
addl %eax, output(%rip)
retq
output:
.long 0 # 0x0
typeinfo name for A:
typeinfo for A:
typeinfo name for B:
typeinfo for B:
vtable for B:
vtable for A:
static_cast
:
visit(A*): # @visit(A*)
movl output(%rip), %eax
subl 12(%rdi), %eax
movl %eax, output(%rip)
subl 12(%rdi), %eax
movl %eax, output(%rip)
retq
main: # @main
retq
output:
.long 0 # 0x0
So, in both cases, it seems that dynamic_cast
cannot be eliminated by the optimizer:
It seems to generate calls to a mysterious __dynamic_cast
function, using the typeinfo of both classes, no matter what. Even if all optimizations are on, and B
is marked final.
Does this low-level call have side effects that I didn't consider? My understanding was that the vtables are essentially fixed and that the vptr in an object doesn't change... am I right? I have only basic familiarity with how vtables are actually implemented and tbh I usually avoid virtual functions in my code, so I haven't really thought deeply on it or accumulated experience.
Am I right that a conforming compiler could replace *dynamic_cast<T*>(ptr)
with *static_cast<T*>(ptr)
as a valid optimization?
Is it true that "usually" (meaning, on x86 machines, let's say, and casting between classes in a hierarchy of "usual" complexity) a dynamic_cast
cannot be optimized away, and will in fact produce a nullptr
even if you *
it right after, leading to nullptr
dereference and crash upon accessing the object?
Is "always replace *dynamic_cast<T*>(ptr)
with either dynamic_cast
+ test or assertion of some kind, or with *static_cast<T*>(ptr)
" a sound advice?
T& object = *dynamic_cast<T*>(ptr);
is broken because it invokes UB on failure, period. I see no need to belabor the point. Even if it seems to work on current compilers, it may not work on later versions with more aggressive optimizers.
If you want checks and don't want to be bothered writing an assertion, use the reference form that throws bad_cast
on failure:
T& object = dynamic_cast<T&>(*ptr);
dynamic_cast
isn't just a run-time check. It can do things static_cast
can't. For example, it can cast sideways.
A A (*)
| |
B C
\ /
\ /
D
If the actual most derived object is a D
, and you have a pointer to the A
base marked with *
, you can actually dynamic_cast
it to get a pointer to the B
subobject:
struct A { virtual ~A() = default; };
struct B : A {};
struct C : A {};
struct D : B, C {};
void f() {
D d;
C& c = d;
A& a = c;
assert(dynamic_cast<B*>(&a) != nullptr);
}
Note that a static_cast
here would be completely wrong.
(Another prominent example where dynamic_cast
can do something static_cast
can't is when you are casting from a virtual base to a derived class.)
In a world without final
or whole-program knowledge, you have to do the check at run time (because C
and D
may not be visible to you). With final
on B
, you should be able to get away with not doing it, but I'm not surprised if compilers haven't gotten around to optimizing that case yet.