Search code examples
c++segmentation-faultgsl

segmentation fault 'gsl_spmatrix_add'


EDIT: I have changed the question to new code that produces the same error and is more reliable in doing so.

I have been struggling to find a segmentation fault in my code for a while now and have boiled it down to the following code:

#include <gsl/gsl_spmatrix.h>

#include <iostream>

using namespace std;

void test_gsl() {
    size_t size = 5;
    size_t nzmax = 5 * 5;
    constexpr size_t threads = 5;

    // allocate
    gsl_spmatrix* thread_matrices[threads];
    for (size_t thread = 0; thread < threads; thread++) {
        thread_matrices[thread] = gsl_spmatrix_alloc_nzmax(size, size, nzmax, GSL_SPMATRIX_TRIPLET);
    }

    // set
    for (size_t i = 0; i < threads; i++) {
        gsl_spmatrix_set(thread_matrices[i], 0, 0, 1.0);
    }

    // crs
    for (size_t i = 0; i < threads; i++) {
        gsl_spmatrix* temp = thread_matrices[i];
        thread_matrices[i] = gsl_spmatrix_crs(thread_matrices[i]);
        gsl_spmatrix_free(temp);
    }

    // add to total
    gsl_spmatrix* total_matrix = gsl_spmatrix_alloc_nzmax(size, size, nzmax, GSL_SPMATRIX_CRS);
    gsl_spmatrix* total_copy = gsl_spmatrix_alloc_nzmax(size, size, nzmax, GSL_SPMATRIX_CRS);
    for (size_t i = 0; i < threads; i++) {
        gsl_spmatrix_memcpy(total_copy, total_matrix);  // this is required to avoid another segfault
        gsl_spmatrix_add(total_matrix, total_copy, thread_matrices[i]); // unknown segfault!
    }

    gsl_spmatrix_free(total_matrix);
    gsl_spmatrix_free(total_copy);
}

int main(int argc, char* argv[]) {
    
    test_gsl();
    printf("end\n");

    return 0;
}

When I run this I consistently get the following output:

Segmentation fault (core dumped)

The segmentation fault is on the line with gsl_spmatrix_add(total_matrix, total_copy, thread_matrices[i]);.

I'm compiling this code using cmake:

cmake_minimum_required(VERSION 3.22.1)

project(diskmodel)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED YES)

add_subdirectory("src")
project(galaxy)

find_package(GSL REQUIRED)

add_executable(${PROJECT_NAME} main.cpp)

set_target_properties(${PROJECT_NAME} PROPERTIES OUTPUT_NAME "${PROJECT_NAME}" SUFFIX ".exe")

target_link_libraries(${PROJECT_NAME} GSL::gsl GSL::gslcblas )

What is causing this seg fault?

EDIT:

After compiling with: g++ 'gsl-config --libs' main.cpp -fsanitize=undefined -g I get the same output as before. When compiling with address I get:

=================================================================
==31330==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 400 byte(s) in 5 object(s) allocated from:
    #0 0x7efd44b64a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0x7efd449d393e in gsl_spmatrix_alloc_nzmax (/lib/x86_64-linux-gnu/libgsl.so.23+0x1f893e)

Indirect leak of 240 byte(s) in 5 object(s) allocated from:
    #0 0x7efd44b64808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7efd449d3b6c in gsl_spmatrix_alloc_nzmax (/lib/x86_64-linux-gnu/libgsl.so.23+0x1f8b6c)

Indirect leak of 200 byte(s) in 5 object(s) allocated from:
    #0 0x7efd44b64808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7efd449d3b88 in gsl_spmatrix_alloc_nzmax (/lib/x86_64-linux-gnu/libgsl.so.23+0x1f8b88)

Indirect leak of 40 byte(s) in 5 object(s) allocated from:
    #0 0x7efd44b64808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7efd449d39ac in gsl_spmatrix_alloc_nzmax (/lib/x86_64-linux-gnu/libgsl.so.23+0x1f89ac)

Indirect leak of 40 byte(s) in 5 object(s) allocated from:
    #0 0x7efd44b64808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7efd449d397d in gsl_spmatrix_alloc_nzmax (/lib/x86_64-linux-gnu/libgsl.so.23+0x1f897d)

When compiling using my cmake file and running gdb galaxy.exe I get the following backtrace:

#0  0x00007ffff7f2c185 in gsl_spblas_scatter () from /lib/x86_64-linux-gnu/libgsl.so.23
#1  0x00007ffff7f2b364 in gsl_spmatrix_add () from /lib/x86_64-linux-gnu/libgsl.so.23
#2  0x00005555555553d2 in test_gsl () at .../src/main.cpp:35
#3  0x0000555555555420 in main (argc=1, argv=0x7fffffffdaf8) at .../src/main.cpp:44

and no history when using -p.

When using ulimit -c unlimited and then running a core file is not generated. I tried looking into this, but I can't seem to find it to generate anywhere and I don't know why.


Solution

  • Looks like a bug in GSL. Please report :-)

    The line

    gsl_spmatrix *total_matrix = gsl_spmatrix_alloc_nzmax(size, size, nzmax, GSL_SPMATRIX_CRS);
    

    is a valid allocator of a GSL sparse matrix. Its initialization, however, is "smart" in that some of its memory buffers are malloced, but not initialized. This refers to the member p. Line 130 of init_source.c (from GSL sources, submodule (directory) spmatrix):

    m->p = malloc((n1 + 1) * sizeof(int));
    

    The next thing your code does is

    gsl_spmatrix_memcpy(total_copy, total_matrix); // this is required to avoid another segfault
    

    Well, the comment is somewhat intriguing, but let's look into the code (lines 93-96 of copy_source.c):

              for (n = 0; n < src->size1 + 1; ++n)
                {
                  dest->p[n] = src->p[n];
                }
    

    Here, size1 seems to be the number of matrix rows, which were declared as 5. So, the code replaces (by copying) rubbish with rubbish. This tells us that GSL does not seem to work well if a matrix declared as having 5 rows has fewer than 5 nonzero rows. I believe this is the solution of your problem. You declared some matrices, e.g. total_matrix and total_copy as having 5 rows, but they actually have none. So far the code is not buggy, however, because copying rubbish onto rubbish is no error.

    The next step in your code:

    gsl_spmatrix_add(total_matrix, total_copy, thread_matrices[i]);
    

    invokes this code related to the member p:

          for (j = 0; j < outer_size; ++j)
            {
              Cp[j] = nz;
    

    This opens a loop, which in your case will be executed 5 times. Here Cp is a shorthand for C->p. The only element of the p member that is initialized so far is thus the j-th one of C in C = A + B. Next, inside this loop we can see:

              /* CSC: x += A(:,j); CSR: x += A(j,:) */
              nz = FUNCTION (spmatrix, scatter) (a, j, w, x, (int) (j + 1), c, nz);
    
    

    Notice that j is passed as the 2nd argument, and incompletely initialised a as the 1st. This invokes spmatrix_scatter defined in line 538 via a macro.

    static size_t
    FUNCTION (spmatrix, scatter) (const TYPE (gsl_spmatrix) * A, const size_t j, int * w,
                                  ATOMIC * x, const int mark, TYPE (gsl_spmatrix) * C, size_t nz)
    {
      int p;
      int * Ai = A->i;
      int * Ap = A->p;
      ATOMIC * Ad = A->data;
      int * Ci = C->i;
    
      for (p = Ap[j]; p < Ap[j + 1]; ++p)
        {
    

    Now, as can be seen, GSL accesses an uninitialized values of Ap[j] and Ap[j + 1]. This results in an immediate segfault a few instructions later.

    Now, how to avoid this?

    Let's look inside the "kosher" way of creating a CSR matrix (lines 152-156, compress_source.c):

          Cp = dest->p;
    
          /* initialize row pointers to 0 */
          for (n = 0; n < dest->size1 + 1; ++n)
            Cp[n] = 0;
    

    Hurray! This is a proper initialization of the p member. By the way, the next few lines explain that the p member in CRS representation is used to store the number of elements in each row. It seems that this is the code missing in gsl_spmatrix_alloc_nzmax

    Conclusion: do not rely on matrices returned by gsl_spmatrix_alloc_nzmax. They should be OK to use as "destination matrices", e.g. as C in C = A + B, but not as zero-filled source ones.

    Hope this helps.

    PS. You can remove this completely unnecessary invocation of gsl_spmatrix_memcpy(total_copy, total_matrix);