Search code examples
pythoncythonsparse-matrixsparse-array

Fast n-dimensional sparse array in Python / Cython


I have an application that involves large n-dimensional arrays which are very sparse. scipy.sparse has a useful 'vectorized getting and setting' feature, so that Cython can be used to populate a sparse matrix quickly.

Of course the scipy package can't handle n-dimensions. There are two packages I have found that do n-dimensional sparse arrays in python sparray and ndsparse. However it seems neither has the vectorized getting and setting feature.

So I need either:

  • a python package for n-dimensional arrays with vectorized get and set or
  • a c library for sparse arrays which I can easily access with Cython or
  • some 'roll your own' option which I guess would require a c equivalent to a python dict

For my purpose I think mapping the n-dimension coordinates back to 1 or two dimensions could work. What would be better though is to have a dict equivalent that i can access fast inside a Cython loop. I assume this rules out the python dict.

Wondering if someone could give me an example of how to use the c++ map object from within Cython?


Solution

  • If you decide to go with the C dict option, you can use the C++ STL's std::map. It's unlikely that you'll find faster or more robust native code that implements a dictionary/map.

    cppmap.pyx:

    # distutils: language = c++
    
    cdef extern from "<map>" namespace "std":
        cdef cppclass mymap "std::map<int, float>":
            mymap()
            float& operator[] (const int& k)
    
    cdef mymap m = mymap()
    cdef int i
    cdef float value
    
    for i in range(100):
        value = 3.0 * i**2
        m[i] = value
    
    print m[10]
    

    setup.py:

    from distutils.core import setup
    from Cython.Build import cythonize
    setup(name = "cppmapapp"
      ext_modules = cythonize('*.pyx'))
    

    Command line:

    $ python setup.py build
    $ cd build/lib.macosx-10.5-x86_64-2.7
    $ python -c 'import cppmap'
    300.0