Search code examples
linkerlinear-algebralapackintel-mkl

Incompatible function argument list between header and shared lib doesn't crash my program, why?


When building my university project on two different machines I experienced issues with LAPACK and Intel's implementation in MKL. I already figured out the main problem: I used the LAPACK headers from liblapacke-devel which where differing between both machines. In one of the headers, each LAPACK function had two additional size_t arguments at the end of the argument list, having sth. to do with the Fortran origin of LAPACK. Realizing that, and since I was linking against MKL anyways, I just decided to use the MKL headers instead which don't differ in the argument lists.

However I was curious and tried to call one of the LAPACK functions (dsyev) with the additional arguments but still link against MKL, which does not have these additional arguments. I expected a segfault or similar, but my test program worked just fine. What could be the reason?

My theory is that since the arguments are at the end of the argument list, they are just not accessed and cleaned up when returning to the calling function. If true, would this mean doing such a thing is just fine or are there possible other implications that could lead to runtime issues? How could I inspect the possibly misaligned stack before the call to dsyev?

I tried objdump and nm only to realize that information about argument lists are never part of shared libraries, only the symbol names. Due to my lack of experience using GDB I also couldn't extract any meaningful information from there. How could I find out if my theory why this still works is correct?


Here are the relevant excerpts of lapack.h and mkl_lapack.h referring to dsyev_, from both installations: Container, lapack.h from liblapacke-dev, version 3.10.0-2ubuntu1:

...
/* It seems all current Fortran compilers put strlen at end.
*  Some historical compilers put strlen after the str argument
*  or make the str argument into a struct. */
#define LAPACK_FORTRAN_STRLEN_END
...
#define LAPACK_dsyev_base LAPACK_GLOBAL(dsyev,DSYEV)
void LAPACK_dsyev_base(
    char const* jobz, char const* uplo,
    lapack_int const* n,
    double* A, lapack_int const* lda,
    double* W,
    double* work, lapack_int const* lwork,
    lapack_int* info
#ifdef LAPACK_FORTRAN_STRLEN_END
    , size_t, size_t
#endif
);
#ifdef LAPACK_FORTRAN_STRLEN_END
    #define LAPACK_dsyev(...) LAPACK_dsyev_base(__VA_ARGS__, 1, 1)
#else
    #define LAPACK_dsyev(...) LAPACK_dsyev_base(__VA_ARGS__)
#endif

This results in dsyev having two additional size_t arguments which are then again taken care of by the LAPACK_dsyev define.

Host, lapack.h from liblapacke-dev, version 3.9.0-1build1:

#define LAPACK_dsyev LAPACK_GLOBAL(dsyev,DSYEV)
void LAPACK_dsyev(
    char const* jobz, char const* uplo,
    lapack_int const* n,
    double* A, lapack_int const* lda,
    double* W,
    double* work, lapack_int const* lwork,
    lapack_int* info );

Container, mkl_lapack.h from Intel OneMKL, version 2024.2:

void DSYEV( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
            const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
            MKL_INT* info ) NOTHROW;
void dsyev( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
            const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
            MKL_INT* info ) NOTHROW;
void dsyev_( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
             const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
             MKL_INT* info ) NOTHROW;

(Capslock C-style, C-style, Fortran style)

Host, mkl_lapack.h from Intel OneMKL, version 2021.1.1:

void DSYEV( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
            const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
            MKL_INT* info ) NOTHROW;
void DSYEV_( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
             const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
             MKL_INT* info ) NOTHROW;
void dsyev( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
            const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
            MKL_INT* info ) NOTHROW;
void dsyev_( const char* jobz, const char* uplo, const MKL_INT* n, double* a,
             const MKL_INT* lda, double* w, double* work, const MKL_INT* lwork,
             MKL_INT* info ) NOTHROW;

(Capslock C-style, capslock Fortran style, C-style, Fortran style)


Solution

  • I expected a segfault or similar, but my test program worked just fine.

    Your expectations are incorrect. Works just fine is the expected behavior in this case for most calling conventions.

    What could be the reason?

    If you read about calling conventions, you'll understand that the extra parameters are passed in additional registers (or on the stack).

    The caller puts these parameters there, but the called routine simply never looks at them (since it doesn't expect them).

    So the only effect is that the caller performs unnecessary operations and incurs tiny overhead.

    P.S.

    In C++ such a mismatch would cause the program to fail with unresolved symbol, since the number and types of parameters are encoded into the (mangled) function name.

    P.P.S.

    It is trivial to construct an example of this, e.g.

    // foo.h
    int foo(int);
    
    // foo.c
    int foo() { return 42; }
    
    // main.c
    #include <stdio.h>
    #include "foo.h"
    int main() { printf("%d\n", foo(13)); }
    

    Compile and link this with gcc main.c foo.c, verify that it prints 42, then disassemble main and observe what main does with the 13 parameter, and how foo ignores that parameter.

    P.P.P.S As PhilMasteG points out, there exist calling conventions in which

    1. The arguments are passed on the stack and
    2. The caller puts arguments there, and the callee cleans them up.

    On such a system a program with mismatch between the caller and the callee may indeed crash.