Search code examples
cvisual-c++intrinsicsbit-fieldsuefi

Unresolved external symbol __aullshr when optimization is turned off


I am compiling a piece of UEFI C code with Visual Studio 2015 C/C++ compiler.

The compiler is targeting IA32, not X64.

When turning on the optimization with "/O1", the build is OK.

When turning off the optimization with "/Od", the build gives below error:

error LNK2001: unresolved external symbol __aullshr

According to here, there's an explanation why this kind of functions can be called implicitly by the compiler:

It turns out that this function is one of several compiler support functions that are invoked explicitly by the Microsoft C/C++ compiler. In this case, this function is called whenever the 32-bit compiler needs to multiply two 64-bit integers together. The EDK does not link with Microsoft's libraries and does not provide this function.

Are there other functions like this one? Sure, several more for 64-bit division, remainder and shifting.

But according to here:

...Compilers that implement intrinsic functions generally enable them only when a program requests optimization...

So how could such functions still be called when I explicitly turned off the optimization with /Od??

ADD 1 - 2:32 PM 2/16/2019

It seems I am wrong about the __aullshr function.

It is not a compiler intrinsic function. According to here, it turns out to be a runtime library function, whose implementation can be found in: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\intel\ullshr.asm or C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\crt\src\i386\ullshr.asm

Such VC runtime functions are brought in by the compiler for 32-bit applications to do 64-bit operations.

But I still don't know why /O1 can build pass while /Od failed? It seems optimization switch can affect the usage of VC runtime library.

ADD 2 - 4:59 PM 2/17/2019

I found the code that cause the build failure.

It turns out to be some C struct bit field operation. There's a 64-bit C struct which has a lot of bit fields backed by a single UINT64 variable. When I comment out the single line of code that access these bit fields, the build is pass. It seems _aullshr() function is used to access these bit fields when /Od is specified.

Since this is part of the firmware code, I am wondering if it is a good practice to turn-off the optimization with /Od?

ADD 3 - 9:33 AM 2/18/2019

I created below minimal reproducible example for VS2015.

First, there's a static lib project:

(test.c)

typedef unsigned __int64    UINT64;

typedef union {
    struct {
        UINT64 field1 : 16;
        UINT64 field2 : 16;
        UINT64 field3 : 6;
        UINT64 field4 : 15;
        UINT64 field5 : 2;
        UINT64 field6 : 1;
        UINT64 field7 : 1;
        UINT64 field8 : 1;  //<=========
        UINT64 field9 : 1;
        UINT64 field10 : 1;
        UINT64 field11 : 1;
        UINT64 field12 : 1; //<=========
        UINT64 field13 : 1;
        UINT64 field14 : 1;
    } Bits;
    UINT64 Data;
} ISSUE_STRUCT;


int
Method1
(
    UINT64        Data
)
{
    ISSUE_STRUCT              IssueStruct;
    IssueStruct.Data = Data;

    if (IssueStruct.Bits.field8 == 1 && IssueStruct.Bits.field12 == 1) { // <==== HERE
        return 1;
    }
    else
    {
        return 0;
    }
}

Then a Windows DLL project:

(DllMain.c)

#include <Windows.h>
typedef unsigned __int64    UINT64;

int
Method1
(
    UINT64        Data
);

int __stdcall DllMethod1
(
    HINSTANCE hinstDLL,
    DWORD fdwReason,
    LPVOID lpReserved
)
{
    if (Method1(1234)) //<===== Use the Method1 from the test.obj
    {
        return 1;
    }
    return 2;
}

Build process:

First, compile the test.obj:

cl.exe /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL /EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl test.c

(add: VC++ 2015 compiler gives below warning for test.obj:

warning C4214: nonstandard extension used: bit field types other than int

)

Then compile the DllMain.obj:

cl /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL /EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl DllMain.c

Then link the DllMain.obj to the test.obj

link DllMain.obj ..\aullshr\test.obj /NOLOGO /NODEFAULTLIB /IGNORE:4001 /OPT:REF /OPT:ICF=10 /MAP /ALIGN:32 /SECTION:.xdata,D /SECTION:.pdata,D /MACHINE:X86 /LTCG /SAFESEH:NO /DLL /ENTRY:DllMethod1 /DRIVER

It will give below error:

Generating code Finished generating code test.obj : error LNK2001: unresolved external symbol __aullshr DllMain.dll : fatal error LNK1120: 1 unresolved externals

  1. If I remove the bit field manipulation code at HERE in the test.c, the link error will disappear.

  2. If I only remove the /Od from the compile options for the test.c, the link error will disappear.

ADD 4 - 12:40 PM 2/18/2019

Thanks to @PeterCordes in his comment, there's an even simpler way to reproduce this issue. Just invoke below method:

uint64_t shr(uint64_t a, unsigned c) { return a >> c; }

Then compile the source code with below command:

cl /nologo /arch:IA32 /c /GS- /W4 /Gs32768 /D UNICODE /O1b2 /GL /EHs-c- /GR- /GF /Gy /Zi /Gm /Gw /Od /Zl DllMain.c

link DllMain.obj /NOLOGO /NODEFAULTLIB /IGNORE:4001 /OPT:REF /OPT:ICF=10 /MAP /ALIGN:32 /SECTION:.xdata,D /SECTION:.pdata,D /MACHINE:X86 /LTCG /SAFESEH:NO /DLL /ENTRY:DllMethod1 /DRIVER

This issue can be reproduced for:

  • Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x86 (VS2013)

  • Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24210 for x86 (VS2015)

  • Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x86 (VS2015)

As mandated in the UEFI coding standard 5.6.3.4 Bit Fields :

Bit fields may only be of type INT32, signed INT32, UINT32, or a typedef name defined as one of the three INT32 variants.

So my final solution is to modify the UEFI code to use UINT32 instead of UINT64.


Solution

  • Your build setup for creating UEFI applications omits the static library of helper functions that MSVC's code-gen expects to be available. MSVC's code-gen sometimes inserts calls to helper functions, the same way that gcc does for 64x64 multiply or divide on 32-bit platforms, or various other things. (e.g. popcount on targets without hardware popcnt.)

    In this case hand-holding MSVC into less-stupid code-gen (a good thing in itself) happens to remove all the uses of helper functions for your codebase. That's good but doesn't fix your build setup. It can break again if you add code in the future that needs a helper. uint64_t shr(uint64_t a, unsigned c) { return a >> c; } does compile to include a call to the helper function even at -O2.

    A shift by a constant without optimization uses _aullshr, instead of inlining as shrd / shr. This exact problem (broken -Od builds) would recur with uint64_t x ; x >> 4 or something in your source.

    (I don't know where MSVC keeps its library of helper functions. We think it's a static library that you could link without introducing a DLL dependency (impossible for UEFI), but we don't know if its maybe bundled with some CRT startup code that you need to avoid linking with for UEFI.)


    The un-optimized vs. optimized issue is clear with this example. MSVC with optimization doesn't need the helper function, but its braindead -Od code does.

    For bitfield access, MSVC apparently uses a right shift of the base type of the bitfield member. In your case, you made it a 64-bit type, and 32-bit x86 doesn't have a 64-bit integer shift (except using MMX or SSE2). With -Od even for constant counts it puts the data in EDX:EAX, the shift count in cl (just like for x86 shift instructions), and calls __aullshr.

    • __a = ??
    • ull = unsigned long long.
    • shr = shift right (like the x86 asm instruction of the same name).
    • it takes the shift count in cl, exactly like x86 shift instructions.

    From the Godbolt compiler explorer, x86 MSVC 19.16 -Od, with UINT64 as the bitfield member type.

    ;; from int Method1(unsigned __int64) PROC 
        ...
       ; extract IssueStruct.Bits.field8
        mov     eax, DWORD PTR _IssueStruct$[ebp]
        mov     edx, DWORD PTR _IssueStruct$[ebp+4]
        mov     cl, 57                                    ; 00000039H
        call    __aullshr        ; emulation of   shr  edx:eax,  cl
        and     eax, 1
        and     edx, 0
        ;; then store that to memory and cmp/jcc both halves.  Ultra braindead
    

    Obviously for constant shift and access to only 1 bit, that's easy to optimize, so MSVC doesn't actually call the helper function at -O2. It's still pretty inefficient, though! It fails to fully optimize away the 64-bitness of the base type, even though none of the bitfields are wider than 32.

    ; x86 MSVC 19.16 -O2   with unsigned long long as the bitfield type
    int Method1(unsigned __int64) PROC                              ; Method1, COMDAT
        mov     edx, DWORD PTR _Data$[esp]       ; load the high half of the inputs arg
        xor     eax, eax                         ; zero the low half?!?
        mov     ecx, edx                         ; copy the high half
        and     ecx, 33554432       ; 02000000H  ; isolate bit 57
        or      eax, ecx                         ; set flags from low |= high
        je      SHORT $LN2@Method1
        and     edx, 536870912      ; 20000000H   ; isolate bit 61
        xor     eax, eax                          ; re-materialize low=0 ?!?
        or      eax, edx                          ; set flags from low |= high
        je      SHORT $LN2@Method1
        mov     eax, 1
        ret     0
    $LN2@Method1:
        xor     eax, eax
        ret     0
    int Method1(unsigned __int64) ENDP                              ; Method1
    

    Obviously this is really stupid materializing a 0 for the low half instead of just ignoring it. MSVC does much better if we change the bitfield member type to unsigned. (In the Godbolt link, I changed that to bf_t so I could use a typedef separate from UINT64, keeping that for the other union member.)


    With a struct based on unsigned field : 1 bitfield members, MSVC doesn't need the helper at -Od

    And it even makes better code at -O2, so you should definitely do that in your real production code. Only use uint64_t or unsigned long long members for fields that need to be wider than 32 bits, if you care about performance on MSVC which apparently has a missed-optimization bug with 64-bit types for bitfield members.

    ;; MSVC -O2 with plain  unsigned  (or uint32_t) bitfield members
    int Method1(unsigned __int64) PROC                              ; Method1, COMDAT
        mov     eax, DWORD PTR _Data$[esp]
        test    eax, 33554432                     ; 02000000H
        je      SHORT $LN2@Method1
        test    eax, 536870912                    ; 20000000H
        je      SHORT $LN2@Method1
        mov     eax, 1
        ret     0
    $LN2@Method1:
        xor     eax, eax
        ret     0
    int Method1(unsigned __int64) ENDP                              ; Method1
    

    I might have implemented it branchlessly like ((high >> 25) & (high >> 29)) & 1 with 2 shr instructions and 2 and instructions (and a mov). If it's really predictable, though, branching is reasonable and breaks the data dependency. clang does a nice job here, though, using not + test to test both bits at once. (And setcc to get the result as an integer again). This has better latency than my idea, especially on CPUs without mov-elimination. clang doesn't have a missed optimization for bitfields based on 64-bit types, either. We get the same code either way.

    # clang7.0 -O3 -m32    regardless of bitfield member type
    Method1(unsigned long long):                            # @Method1(unsigned long long)
        mov     ecx, dword ptr [esp + 8]
        xor     eax, eax           # prepare for setcc
        not     ecx
        test    ecx, 570425344     # 0x22000000
        sete    al
        ret
    

    UEFI coding standard:

    The EDK II coding standard 5.6.3.4 Bit Fields says that:

    • Bit fields may only be of type INT32, signed INT32, UINT32, or a typedef name defined as one of the three INT32 variants.

    I don't know why they make up these "INT32" names when C99 already has perfectly good int32_t. It's also unclear why they'd place this restriction. Perhaps because of the MSVC missed-optimization bug? Or maybe to aid human programmer comprehension by disallowing some "weird stuff".

    gcc and clang don't warn about unsigned long long as a bitfield type, even in 32-bit mode and with -Wall -Wextra -Wpedantic, in C or C++ mode. I don't think ISO C or ISO C++ have a problem with it.

    Further, Should use of bit-fields of type int be discouraged? points out that plain int as a bitfield type should be discouraged because the signedness is implementation-defined. And that the ISO C++ standard discusses bitfield types from char to long long.

    I think your MSVC warning about non-int bitfields must be from some kind of coding-standard-enforcement package, because normal MSVC on Godbolt doesn't do that even with `-Wall.

    warning C4214: nonstandard extension used: bit field types other than int