Search code examples
c++shared-librariesundefined-referencepartial-specializationtemplate-instantiation

Shared Library: Undefined Reference with Partial Template Specialization and Explicit Template Instantiation


Say, there is a third-party library that has the following in a header file:

foo.h:

namespace tpl {
template <class T, class Enable = void>
struct foo {
  static void bar(T const&) {
    // Default implementation...
  };
};
}

In the interface of my own library, I'm supposed to provide partial specialization of this foo for my own type(s). So, let's say I have:

xxx.h:

# include <foo.h>

namespace ml {
struct ML_GLOBAL xxx {
  // Whatever...
};
}

namespace tpl {
template <>
struct ML_GLOBAL foo<::ml::xxx> {
  static void bar(::ml::xxx const&);
};
}

where ML_GLOBAL is a compiler-specific visibility attribute to ensure that the symbols are available for dynamic linkage (that is by default my build system hides all of the symbols in the produced shared library).

Now, I don't want to disclose my implementation of bar, so I employ explicit template instantiation:

xxx.cpp:

# include "xxx.h"

namespace tpl {
void foo<::ml::xxx>::bar(::ml::xxx const&) {
  // My implementation...
}

extern template struct foo<::ml::xxx>;
}

When the time comes to actually use this tpl::foo<::ml::xxx>::bar function in some consumer application (where my shared library is linked as well), I get the undefined reference error to the tpl::foo<::ml::xxx, void>::bar symbol. And indeed, running nm -CD on the produced shared library shows no trace of tpl::foo<::ml::xxx, void> symbol.

What I've tried so far, were different combinations on where to put ML_GLOBAL (e.g. on explicit template instantiation itself, about what GCC clearly complains unlike Clang) and with/without the second template argument void.

The question is whether this is related to the fact that the original definition has no visibility attribute (ML_GLOBAL) attached by virtue of coming from a third-party library or did I actually miss something here? If I didn't miss anything, then am I really forced to expose my implementation in such a scenario ? [... *cough* looks more like a compiler deficiency to be honest *cough* ...]


Solution

  • It turned out to be a false alarm. Nevertheless, this catch took me a couple of hours to finally remember why this symbol might be invisible to consumers. It's really trivial but I feel like posting it here for future visitors who happen to have the same setup. Basically, if you use either a linker script [1] or a (pure) version script [2] (specified with the --version-script linker option), then don't forget to set global visibility for those tpl::foo* third-party-based symbols (or whichever they are in your case). In my case, I originally had the following:

    {
    global:
      extern "C++" {
        ml::*;
        typeinfo*for?ml::*;
        vtable*for?ml::*;
      };
    
    local:
      extern "C++" {
        *;
      };
    };
    

    what I clearly had to change to

    {
    global:
      extern "C++" {
        tpl::foo*;
        ml::*;
        typeinfo*for?ml::*;
        vtable*for?ml::*;
      };
    
    local:
      extern "C++" {
        *;
      };
    };
    

    in order to link everything properly and get the expected result.

    Hope this helps and regards.

    BONUS


    A curious reader could ask though, "Why the hell are you combining explicit visibility attributes and a linker/version script to control visibility of symbols when there are already the -fvisibility=hidden and -fvisibility-inlines-hidden options which are supposed to do just that?".

    The answer is that they of course do, and I indeed do use them to build my shared libraries. However, there is one catch. It is a common practice to link some internal libraries (privately) used by your shared library statically (into that library), primarily in order to completely conceal such dependencies (keep in mind, though, that header files accompanying your shared library should also be properly designed to implement this). The benefits are clear: clean and controllable ABI and reduced compile times for shared library consumers.

    Take for example, Boost, as the most widespread candidate for such a use case. Encapsulating all of the heavily templated code from Boost privately into your shared library and eliminating any Boost symbols from the ABI will greatly reduce interface pollution and compile times of your shared library consumers, not including the fact that your software components will also look professionally developed.

    Anyway, to the point, it turns out that unless those static libraries that you want to link into your shared library were themselves also built with the -fvisibility=hidden and -fvisibility-inlines-hidden options (what would be a ridiculous expectation as nobody is going to distribute static libraries with hidden interface symbols by default as it defeats their purpose), their symbols will inevitably still be visible (for instance, through nm -CD <shared-library>) regardless of the fact that you're building the shared library itself with those options. That is, in this case, you have two options to resolve it:

    1. Manually rebuild those static libraries (your shared library dependencies) with the -fvisibility=hidden and -fvisibility-inlines-hidden options, what is clearly not always possible/practical given their potential third-party origin.
    2. Use linker/version script (like it is done above) to supply at link time in order to instruct linker to forcefully export/hide proper symbols from your shared library.