I'm confused by the assembly output of Visual C++ 2015 (x86).
I want to know the virtual table layout in VC, so I write the following simple class with a virtual function.
#include <stdio.h>
struct Foo
{
virtual int GetValue()
{
uintptr_t vtbl = *(uintptr_t *)this;
uintptr_t slot0 = ((uintptr_t *)vtbl)[0];
uintptr_t slot1 = ((uintptr_t *)vtbl)[1];
printf("vtbl = 0x%08X\n", vtbl);
printf(" [0] = 0x%08X\n", slot0);
printf(" [1] = 0x%08X\n", slot1);
return 0xA11BABA;
}
};
extern "C" void Check();
int main()
{
Foo *pFoo = new Foo;
int x = pFoo->GetValue();
printf("x = 0x%08X\n", x);
printf("\n");
Check();
}
And to check the layout, I implement an assembly function (the magic name comes from the assembly output vtab.asm
of vtab.cpp
, and is the mangled version of Foo::GetValue
).
.model flat
extern _printf : proc
extern ?GetValue@Foo@@UAEHXZ : proc
.const
FUNC_ADDR db "Address of Foo::GetValue = 0x%08X", 10, 0
.code
_Check proc
push ebp
mov esp, ebp
push offset ?GetValue@Foo@@UAEHXZ
push offset FUNC_ADDR
call _printf
add esp, 8
pop ebp
ret
_Check endp
end
Then, I compile and run.
ml /c check.asm
cl /Fa vtab.cpp check.obj
vtab
And get the following output on my computer.
vtbl = 0x00FF2174
[0] = 0x00FE1300
[1] = 0x6C627476
x = 0x0A11BABA
Address of Foo::GetValue = 0x00FE1300
It clearly shows the virtual function GetValue
is at offset 0 of the virtual table. But the assembly output of vtab.cpp
seems to imply GetValue
is at offset 4 (see the following comments start with three semicolons).
; COMDAT ??_7Foo@@6B@
CONST SEGMENT
??_7Foo@@6B@ DD FLAT:??_R4Foo@@6B@ ; Foo::`vftable'
DD FLAT:?GetValue@Foo@@UAEHXZ ;;; GetValue at offset 4
CONST ENDS
; Function compile flags: /Odtp
; COMDAT ??0Foo@@QAE@XZ
_TEXT SEGMENT
_this$ = -4 ; size = 4
??0Foo@@QAE@XZ PROC ; Foo::Foo, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], OFFSET ??_7Foo@@6B@ ;;; Init ptr to virtual table
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0Foo@@QAE@XZ ENDP ; Foo::Foo
Thanks for your answering!
Update
@Hans Passant This seems to be a bug. I ml /c
the assembly output vtab.asm
(with a few symbols deletion) and link it with check.obj
to get an exe vtab2.exe
. But vtab2.exe
won't run correctly. Then I modify the following code
; COMDAT ??_7Foo@@6B@
CONST SEGMENT
??_7Foo@@6B@ DD FLAT:??_R4Foo@@6B@ ; Foo::`vftable'
DD FLAT:?GetValue@Foo@@UAEHXZ
CONST ENDS
to
; COMDAT ??_7Foo@@6B@
CONST SEGMENT
__NOT_USED_ DD FLAT:??_R4Foo@@6B@ ; Foo::`vftable'
??_7Foo@@6B@ DD FLAT:?GetValue@Foo@@UAEHXZ
CONST ENDS
and ml
and link
again to get vtab3.exe
. Now vtab3.exe
runs correctly and produces an output similar to vtab.exe
.
I don't think Microsoft would consider this a bug. Yes, the assembly output should have the vtable symbol on the second element of the vtable so that the RTTI entry appears at offset -4 of the table. However the table should also be in a COMDAT section, but instead there's only a comment in the assembly output (; COMDAT
) that indicates this. That's because while the PECOFF object file format supports COMDAT sections, the assembler (MASM, invoked as ml
) doesn't. There's no way for the compiler to generate an assembly file that actually corresponds to the contents of the object file it creates.
Or to put it another way, the assembly output isn't meant to be assembled. It's just meant to be informative. Even with your fix applied the assembly output doesn't generate the same object file the compiler does. If you did this in a more realistic project where Foo
was used in more than one object file you'd get multiple definition errors when linking. If you want to see the real output of the compiler you need to look at the object file.
For example if you use dumpbin /all vtab.obj
and go through its output, you'll see something like:
SECTION HEADER #C
.rdata name
...
40301040 flags
Initialized Data
COMDAT; sym= "const Foo::`vftable'" (??_7Foo@@6B@)
4 byte align
Read Only
RAW DATA #C
00000000: 00 00 00 00 00 00 00 00 ........
RELOCATIONS #C
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000000 DIR32 00000000 34 ??_R4Foo@@6B@ (const Foo::`RTTI Complete Object Locator')
00000004 DIR32 00000000 1F ?GetValue@Foo@@UAEHXZ (public: virtual int __thiscall Foo::GetValue(void))
...
COFF SYMBOL TABLE
...
026 00000000 SECTC notype Static | .rdata
Section length 8, #relocs 2, #linenums 0, checksum 0, selection 6 (pick largest)
028 00000004 SECTC notype External | ??_7Foo@@6B@ (const Foo::`vftable')
It's not easy to understand, but all the information about the actual layout of the vtable is given. The symbol for the vtable, ??_7Foo@@6B@ (const Foo::`vftable')
, is at offset 00000004
of SECTC
or section number 0xC. Section #C is 8 bytes long and has relocations for the RTTI locator and Foo::GetValue
that are applied at offsets 00000000
and 00000004
of the section. So you can see that in the object file the vtable symbol does in fact point to the entry containing the pointer to the first virtual method.
Open Watcom has a utility that can show you the contents of an object file in a more assembly-like fashion, though notably not in the syntax that MASM uses. Running wdis t279.obj
shows:
.new_section .rdata, "dr2"
0000 00 00 00 00 .long ??_R4Foo@@6B@
0004 ??_7Foo@@6B@:
0004 00 00 00 00 .long ?GetValue@Foo@@UAEHXZ