Search code examples
vectorizationeigen

I am hitting a segmentation fault only when both using AVX and linking to other code that does


I am using Eigen to set up a sparse linear system as follows (slightly pseudocode):

Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>> solver;
Eigen::SparseMatrix<real_t> P(rows, cols);
P.setFromTriplets(triplet_list.begin(), triplet_list.end());
P.makeCompressed();
solver.compute(P);

This code is within a small library. I am compiling with -mavx -mfma -O2. If I build a simple executable using this library, everything runs fine. If I instead link into another library (in which the C++ sources are built with the same compiler flags, but which also includes CUDA), I get a segmentation fault in Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>>::factorize. If I compile with -O0 the segmentation fault disappears.

I have not been able to isolate this into a minimum working example; I would appreciate suggestions on how I could describe the problem better or ideas as to what might be going wrong. While vectorization is not critical for this solve, I do need it elsewhere in the library so simply removing the AVX flags is not a good option.


EDIT: adding some context as requested.

If I compile with -g and run in gdb, the exact crash line is line 98 in Core/util/Memory.h

   │95      /** \internal Frees memory allocated with handmade_aligned_malloc */                                                                                                                                                                                                         │
   │96      inline void handmade_aligned_free(void *ptr)                                                                                                                                                                                                                                 │
   │97      {                                                                                                                                                                                                                                                                            │
  >│98        if (ptr) std::free(*(reinterpret_cast<void**>(ptr) - 1));                                                                                                                                                                                                                  │
   │99      } 

with stack trace

#0  0x00007ffff12e94dc in free () from /lib64/libc.so.6
#1  0x00007fffe3dadb1f in Eigen::internal::handmade_aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:98
#2  Eigen::internal::aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:179
#3  Eigen::aligned_allocator<float>::deallocate (this=<optimized out>, p=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:763
#4  std::allocator_traits<Eigen::aligned_allocator<float> >::deallocate (__a=..., __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/alloc_traits.h:328
#5  std::_Vector_base<float, Eigen::aligned_allocator<float> >::_M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/stl_vector.h:180
#6  std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append (this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>, __n=<optimized out>) at include/c++/7.3.0/bits/vector.tcc:592
#7  0x00007fffe3dae688 in std::vector<float, Eigen::aligned_allocator<float> >::resize (__new_size=10, this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>) at include/c++/7.3.0/bits/stl_vector.h:692

If I run with valgrind, I see errors of the form below. However, the program no longer crashes (the same code run outside of valgrind does still segfault).

==16218== Invalid read of size 8
==16218==    at 0x19049B16: handmade_aligned_free (Memory.h:98)
==16218==    by 0x19049B16: aligned_free (Memory.h:179)
==16218==    by 0x19049B16: deallocate (Memory.h:763)
==16218==    by 0x19049B16: deallocate (alloc_traits.h:328)
==16218==    by 0x19049B16: _M_deallocate (stl_vector.h:180)
==16218==    by 0x19049B16: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218==    by 0x1904A687: resize (stl_vector.h:692)
==16218==  Address 0x3e195558 is 8 bytes before a block of size 8 alloc'd
==16218==    at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==16218==    by 0x123B7326: Eigen::internal::aligned_malloc(unsigned long) (in /gdn/centos7/0001/x3/prefixes/desmond-dependencies/2.14c7__dc4688ce01c7/lib/libminimax.so)
==16218==    by 0x19049B73: allocate (Memory.h:758)
==16218==    by 0x19049B73: allocate (alloc_traits.h:301)
==16218==    by 0x19049B73: _M_allocate (stl_vector.h:172)
==16218==    by 0x19049B73: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:571)
==16218==    by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid free() / delete / delete[] / realloc()
==16218==    at 0x4C2ACDD: free (vg_replace_malloc.c:530)
==16218==    by 0x19049B1E: handmade_aligned_free (Memory.h:98)
==16218==    by 0x19049B1E: aligned_free (Memory.h:179)
==16218==    by 0x19049B1E: deallocate (Memory.h:763)
==16218==    by 0x19049B1E: deallocate (alloc_traits.h:328)
==16218==    by 0x19049B1E: _M_deallocate (stl_vector.h:180)
==16218==    by 0x19049B1E: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218==    by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid read of size 8
==16218==    at 0x1905327B: handmade_aligned_free (Memory.h:98)
==16218==    by 0x1905327B: aligned_free (Memory.h:179)
==16218==    by 0x1905327B: conditional_aligned_free<true> (Memory.h:230)
==16218==    by 0x1905327B: conditional_aligned_delete_auto<double, true> (Memory.h:416)
==16218==    by 0x1905327B: ~DenseStorage (DenseStorage.h:542)
==16218==    by 0x1905327B: ~PlainObjectBase (PlainObjectBase.h:98)
==16218==    by 0x1905327B: ~Matrix (Matrix.h:178)
==16218==    by 0x1905327B: Eigen::SparseQR<Eigen::SparseMatrix<double, 0, int>, Eigen::COLAMDOrdering<int> >::factorize(Eigen::SparseMatrix<double, 0, int> const&) (SparseQR.h:360)
==16218==    by 0x19047A28: compute (SparseQR.h:118)

I am attempting to turn this into a minimal reproducible example.


Solution

  • The described problem usually occurs if compilation units with different memory-alignment options are linked together. By default Eigen aligns memory to 16 bytes, unless AVX is enabled, in which case memory is aligned to 32 bytes (or 64 bytes for AVX512 -- I think).

    Ideally, you should compile all compilation units with the same target architecture -- if you only plan to run on your local machine best use -march=native (this also enables tuning for the local architecture).

    If you need to have some parts compiled with AVX enabled and others without, you can manually override the memory-alignment of Eigen using -DEIGEN_MAX_ALIGN_BYTES=16 or -DEIGEN_MAX_ALIGN_BYTES=32 (for consistency, either one should be added to all compilation units, even though some would be redundant).