Search code examples
cudasegmentation-faultpycudascikitscusolver

Segmentation Fault in Pycuda using NVIDIA's cuSolver Library


i'm tryin to make a pycuda wrapper inspired by scikits-cuda library, for some operations provided in the new cuSolver library of Nvidia, first I need to perfom an LU factorization through cusolverDnSgetrf() op. but before that I need the 'Workspace' argument, the tool that cuSolver provides to get that is named cusolverDnSgetrf_bufferSize(); but when I use it, just crash and return a segmentation-fault. What I'm doing wrong?

Note: I have already working this op with scikits-cuda but the cuSolver library use a lot this kind of argument and I want to compare the usage between scikits-cuda and my implementation with the new library.


import numpy as np
import pycuda.gpuarray
import ctypes
import ctypes.util

libcusolver = ctypes.cdll.LoadLibrary('libcusolver.so')

class _types:
  handle = ctypes.c_void_p

libcusolver.cusolverDnCreate.restype = int
libcusolver.cusolverDnCreate.argtypes = [_types.handle]

def cusolverCreate():
    handle = _types.handle()
    libcusolver.cusolverDnCreate(ctypes.byref(handle))
    return handle.value

libcusolver.cusolverDnDestroy.restype = int
libcusolver.cusolverDnDestroy.argtypes = [_types.handle]

def cusolverDestroy(handle):
    libcusolver.cusolverDnDestroy(handle)


libcusolver.cusolverDnSgetrf_bufferSize.restype = int
libcusolver.cusolverDnSgetrf_bufferSize.argtypes =[_types.handle,
                                       ctypes.c_int,
                                       ctypes.c_int,
                                       ctypes.c_void_p,
                                       ctypes.c_int,
                                       ctypes.c_void_p]

def cusolverLUFactorization(handle, matrix):
    m,n=matrix.shape
    mtx_gpu = gpuarray.to_gpu(matrix.astype('float32'))
    work=gpuarray.zeros(1, np.float32)
    status=libcusolver.cusolverDnSgetrf_bufferSize(
                          handle, m, n,
                          int(mtx_gpu.gpudata),
                          n, int(work.gpudata))
    print status


x = np.asarray(np.random.rand(3, 3), np.float32)
handle_solver=cusolverCreate()
cusolverLUFactorization(handle_solver,x)
cusolverDestroy(handle_solver)

Solution

  • The last parameter of cusolverDnSgetrf_bufferSize should be a regular pointer, not a GPU memory pointer. Try modifying the cusolverLUFactorization() function as follows:

    def cusolverLUFactorization(handle, matrix):
        m,n=matrix.shape
        mtx_gpu = gpuarray.to_gpu(matrix.astype('float32'))
    
        work = ctypes.c_int()
        status = libcusolver.cusolverDnSgetrf_bufferSize(
                             handle, m, n,
                             int(mtx_gpu.gpudata),
                             n, ctypes.pointer(work))
        print status
        print work.value