Search code examples
c++linuxintel-media-sdk

MFXInit() in libmfx.a segfaults when called from shared object


(While Intel's forum is a more natural place to ask this question I'm posting it here hoping for more activity than Intel's total lack thereof -- so far)

I'm unable to create a dynamic link library that uses Intel Media SDK (linux server) to manipulate h264 video and noticed a problem in the design of the MFX library. The way I understand it, programs are supposed to link to static library, like:

$ g++ .... -L/opt/intel/mediasdk/lib/lin_x64 -lmfx

However, this libmfx.a library appears to delegate all calls to a dlopened dynamic library /opt/intel/mediasdk/lib64/libmfxhw64.so. It is worth noting that function names (and signatures) exposed by static and dynamic libraries are identical, which is kind of confusing and dangerous.

While I don't understand the rationale behind this design, it should not be a problem by itself were it not that apparently some static/global initialization from within the library causes havoc when the (static) libmfx.a is included in a shared object. Ie.:

    +------+     +-----------+
    | main | <-- | mylib.so  |
    +------+     |           |          +---------------+
                 | libmfx.a  | (dlopen) | libmfxhw64.so |
                 |          <-------------              |
                 |+---------+|          |+-------------+|
                 ||MFXInit()||          ||  MFXInit()  ||
                 ||...      ||          ||  ...        ||
                 ||         ||          ||             ||
                 +===========+          +===============+

The above library could be assembled like this:

$ g++ -shared -o mylib.so my1.o my2.o -lmfx

And then (dynamically) linked to main.o like so:

$ g++ -o main main.o mylib.so -ldl

(Note that the additional libdl is necessary to allow libmfx.a to dlopen() libmfxhw64.so.)

Unfortunately, upon the first MFXInit() call, the program causes a segmentation fault (accessing address 0x0000400). GDB backtrace:

#0  0x0000000000000400 in ?? ()
#1  0x00007ffff61fb4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2  0x00007ffff7bd3a1f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) () from ./lib-a.so
#3  0x00007ffff7bd12b1 in MFXInit () from ./lib-a.so
#4  0x00007ffff7bd09c8 in test_mfx () at lib.c:12
#5  0x0000000000400744 in main (argc=1, argv=0x7fffffffe0d8) at main.c:8

(Observe that MFXInit() at stackframe #3 is the one in libmfx.a whereas the one at #1 is in libmfxhw64.so.)

Note that there is no crash when mylib is created as a static library. Using breakpoints and disassembler, I managed to make following backtrace snapshot where in both cases #1 is at MFXInit+424, but they appear to hit different versions of MFXQueryVersion (absolute addresses are meaningless due to relocation):

#0  0x00007ffff6411980 in MFXQueryVersion () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#1  0x00007ffff640c4cd in MFXInit () from /opt/intel/mediasdk/lib64/libmfxhw64-p.so.1.13
#2  0x000000000040484f in MFX_DISP_HANDLE::LoadSelectedDLL(char const*, eMfxImplType, int, int) ()
#3  0x00000000004020e1 in MFXInit ()
#4  0x0000000000401800 in test_mfx () at lib.c:12
#5  0x0000000000401794 in main (argc=1, argv=0x7fffffffe0e8) at main.c:8

Because both static and shared Intel libs expose the same API functions, I can link straight into libmfxhw64.so guts directly, but I suppose that bypassing the static "dispatcher" is without warranty(?)

Could someone explain Intel's idea behind said design? Spec., why provide a static library that only delegates to an .so that has identical interface?

Also, it appears that the SEGV is caused by static/global data in either libmfx.a or libmfxhw64.so. Is there a way to force a specific execution order on dynamically loaded static/global sections? What is the best approach to debug these kinds of problems?


Tested with Intel Media SDK R2 (ubuntu 12) and Intel Media SDK 2015R3-R5 (Centos 7, 1.13/1.15) on Intel Haswell i7-4790 @3.6Ghz

If you have a working Intel MSDK setup, please compile my example code to confirm the issue.


Solution

  • (OK, since no one seems eager, I'll do the inelegant thing and post an answer to my own question).

    After considerable research trying to break the unintentional circular linking, I discovered that the ld option --exclude-libs provides solace. Essentially, I was looking for a way to force removal of any libmfx.a symbols after using them to resolve dependencies in lib.o while creating the DLL. This could be accomplished by creating the so like this:

    g++ -shared -o lib-a.so lib.o -L/opt/intel/mediasdk/lib/lin_x64 -lmfx -Wl,--exclude-libs=libmfx
    

    Once the library is created like this, Bob's you uncle:

    g++ -o main-so-a main.o lib-a.so -ldl
    

    (Note that libdl is still needed because Intel's MFX (now inside lib-a.so) still uses dlopen to discover libmfxhw64.so)

    From the ld man page:

       --exclude-libs lib,lib,...
           Specifies a list of archive libraries from which symbols should not be
           automatically exported.  The library names may be delimited by commas or
           colons.  Specifying "--exclude-libs ALL" excludes symbols in all archive
           libraries from automatic export.  This option is available only for the
           i386 PE targeted port of the linker and for ELF targeted ports.  For i386
           PE, symbols explicitly listed in a .def file are still exported,
           regardless of this option.  For ELF targeted ports, symbols affected
           by this option will be treated as hidden.
    

    So, essentially the trick is no make sure that the relevant ELF symbols are marked hidden. Normally this would be handled through #pragmas by the library developers (ie. Intel), but due to their negligence this needs to be retrofitted in this case.

    I suppose the same could have been accomplished with a --version-script map file, but that might have turned out to be more fragile since we want to fully encapsulate libmfx.a anyway.