Search code examples
cudalinear-equationcusolver

CUDA : cuSolver raises an exception


I am trying to use cusolver library to solve a number of linear equations but instead an exception is raised which is very strange. the code is using only one function from the library and the rest is memory allocation and memory copy. the function is

cusolverSpScsrlsvcholHost(
   cusolverSpHandle_t handle, int m, int nnz,
   const cusparseMatDescr_t descrA, const float *csrVal,
   const int *csrRowPtr, const int *csrColInd, const float *b,
   float tol, int reorder, float *x, int *singularity); 

I think my problem maybe in tol - reorder - singularity parameters as the rest is the matrix parameters here is the code:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cuda.h>
#include <cusparse.h>
#include <cublas_v2.h>
#include <stdio.h>
#include <cusolverSp.h>

int main()
{
    //initialize our test cases
    const int size = 3;
    int nnz = 6 ;
    int sing = -1 ;

    //float values[] = {0,0,0,0} ;
    float values[] = {1,2,3,4,5,6} ;
    int colIdx[] = {0,0,1,0,1,2};
    int rowPtr[] = {0, 1,3,7};

    float x[] = {4,-6,7};
    float y[3]= {0,0,0} ;

    float *dev_values = 0 ;
    int *dev_rowPtr = 0 ;
    int *dev_colIdx = 0 ;
    float *dev_x = 0 ;
    float *dev_y = 0 ;

    cusolverSpHandle_t solver_handle ;
    cusolverSpCreate(&solver_handle) ;

    cusparseMatDescr_t descr = 0;

    cusparseCreateMatDescr(&descr);
    cusparseSetMatType(descr,CUSPARSE_MATRIX_TYPE_GENERAL);
    cusparseSetMatIndexBase(descr,CUSPARSE_INDEX_BASE_ZERO);

    // Choose which GPU to run on, change this on a multi-GPU system.
    cudaSetDevice(0);

    cudaEvent_t start, stop;
    float time;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
    cudaEventRecord(start, 0);

    // Allocate GPU buffers for three vectors (two input, one output)    .
    cudaMalloc((void**)&dev_x, size * sizeof(float));
    cudaMalloc((void**)&dev_y, size * sizeof(float));
    cudaMalloc((void**)&dev_values, nnz * sizeof(float));
    cudaMalloc((void**)&dev_rowPtr, (size + 1) * sizeof(int));
    cudaMalloc((void**)&dev_colIdx, nnz * sizeof(int));

    //Memcpy
    cudaMemcpyAsync(dev_x, x, size * sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpyAsync(dev_values, values, nnz * sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpyAsync(dev_rowPtr, rowPtr, (size + 1) * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpyAsync(dev_colIdx, colIdx, nnz * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpyAsync(dev_y, y, size * sizeof(float), cudaMemcpyHostToDevice);

    cusolverSpScsrlsvluHost(solver_handle, size, nnz, descr, dev_values, dev_rowPtr, dev_colIdx,     dev_y, 0,0, dev_x, &sing);


    cudaMemcpyAsync(y, dev_y, size*sizeof(float), cudaMemcpyDeviceToHost );

    cudaEventRecord(stop, 0);
    cudaEventSynchronize(stop);
    cudaEventElapsedTime(&time, start, stop);
    printf ("Time for the kernel: %f ms\n", time);

    printf("%f\n",y[0]);
    printf("%f\n",y[1]);
    printf("%f\n",y[2]);

    // cudaDeviceReset must be called before exiting in order for profiling and
    // tracing tools such as Nsight and Visual Profiler to show complete traces.

    cudaDeviceReset();
    cudaFree(dev_x);
    cudaFree(dev_y);
    cudaFree(dev_values);
    cudaFree(dev_rowPtr);
    cudaFree(dev_colIdx);
    return 1;
}

Solution

  • There are at least 3 issues in your code:

    1. You are using the Host variant of the function: cusolverSpScsrlsvluHost(). If you review the documentation for cusolverSpScsrlsvluHost, you will find that for the Host MemSpace, the function expects all arguments and pointer-arguments to be host-based. But you are passing device pointers to the function. You will get segfaults by doing that. For all of the arguments like dev_values, you will need to replace those with equivalent host data pointers (e.g. values in place of dev_values).

    2. Your CSR matrix formatting is incorrect. This line:

      int rowPtr[] = {0, 1,3,7};
      

      should be this:

      int rowPtr[] = {0, 1,3,6};
      

      The correct row-pointer index pointing to one element past the last one is 6, not 7, since the 6 actual elements are numbered 0..5. This issue can also lead to segfaults.

    3. You are passing y and x incorrectly (reversed) to cusolverSpScsrlsvluHost(). Since you have put non-zero values in x, presumably you intend that to be your RHS vector. This vector takes the name b in the documentation, and it is the first vector to be passed. Your y vector then is presumably the solution vector, and it is the last vector to be passed in the order of the arguments (and it takes the name x in the documentation).

    4. I suggest using proper error checking.

    The following code has the above issues addressed, and produces a sensible result:

    $ cat t979.cu
    #include <cusparse.h>
    #include <stdio.h>
    #include <cusolverSp.h>
    #include <assert.h>
    
    int main()
    {
        //initialize our test cases
        const int size = 3;
        const int nnz = 6 ;
        int sing = 0;
    
        //float values[] = {0,0,0,0} ;
        float values[nnz] = {1,2,3,4,5,6} ;
        int colIdx[nnz] = {0,0,1,0,1,2};
        int rowPtr[size+1] = {0, 1,3,6};
    
        float x[size] = {4,-6,7};
        float y[size]= {0,0,0} ;
        cusolverStatus_t cso;
        cusolverSpHandle_t solver_handle ;
        cso = cusolverSpCreate(&solver_handle) ;
        assert(cso == CUSOLVER_STATUS_SUCCESS);
        cusparseStatus_t csp;
        cusparseMatDescr_t descr = 0;
    
        csp = cusparseCreateMatDescr(&descr);
        assert(csp == CUSPARSE_STATUS_SUCCESS);
        csp = cusparseSetMatType(descr,CUSPARSE_MATRIX_TYPE_GENERAL);
        assert(csp == CUSPARSE_STATUS_SUCCESS);
        csp = cusparseSetMatIndexBase(descr,CUSPARSE_INDEX_BASE_ZERO);
        assert(csp == CUSPARSE_STATUS_SUCCESS);
        cso = cusolverSpScsrlsvluHost(solver_handle, size, nnz, descr, values, rowPtr, colIdx, x, 0.0,0, y, &sing);
        assert(cso == CUSOLVER_STATUS_SUCCESS);
        printf("%f\n",y[0]);
        printf("%f\n",y[1]);
        printf("%f\n",y[2]);
    
        return 0;
    }
    $ nvcc -o t979 t979.cu -lcusolver -lcusparse
    $ ./t979
    4.000000
    -4.666667
    2.388889
    $
    

    Also note that there is a completely worked CUDA sample code that demonstrates correct usage of this function.