cython float64 error although float32 specifically set

I am trying to implement user's @rkp solution to their own question of how to speed up sparse matrix multiplications with cython by using the pycuda library (please note this is their second solution in their post).

After installing pycuda, pymetis etc and running their exact same code (in IDLE Python 3.5.2) I am getting:

TypeError: 'numpy.float64' object cannot be interpreted as an integer

It turns out the (reproducible) part that produces this error is:

import numpy as np
import pycuda.autoinit
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
from pycuda.sparse.packeted import PacketedSpMV
from pycuda.tools import DeviceMemoryPool
from scipy.sparse import csr_matrix

COUNT = 100
N = 5000
P = 0.1
DTYPE = np.int32

#construct objects
np.random.seed(0)
a_dense = np.random.rand(N, N).astype(DTYPE)
a_dense[np.random.rand(N, N) >= P] = 0
a_sparse = csr_matrix(a_dense)

#PacketedSpMV produces the error
spmv = PacketedSpMV(a_sparse, is_symmetric=False, dtype=DTYPE)

And the full error:

Traceback (most recent call last):
  File "C:/Users/svobodov/Desktop/data/tests/cython/t.py", line 23, in <module>
    spmv = PacketedSpMV(a_sparse, is_symmetric=False, dtype=DTYPE)
  File "C:\Python35\lib\site-packages\pycuda\sparse\packeted.py", line 185, in __init__
    local_row_costs)
  File "pkt_build_cython.pyx", line 22, in pycuda.sparse.pkt_build_cython.build_pkt_data_structure
TypeError: 'numpy.float64' object cannot be interpreted as an integer

I initially thought this to be the cython-related double-precision error but this is obviously something different as it is expecting specifically an integer rather than float32..

I tried tweaking the pkt_build_cython.pyx but without any success or confidence that I did it properly.

Any ideas on how to resolve this please?

Solution

As identified in comments, this was a result of a missing integer cast within an internal routine in the PyCUDA codebase.

The bug was actually fixed in 2018, so if you use any PyCUDA 2019 release, you should have the corrected code and this issue should not occur.