Search code examples
c++cmalloccythonmemoryview

Cython: safe way to check whether element in malloc memoryview is assigned


I want to create various small memoryviews in function temporarily and release them when the function is finished. Seems initialize the memoryview by malloc is better, based on the comparisons performed in this answer, and actually is faster. As only part of the elements in each of these small memoryviews are assinged, what's the safe way to check whether it is assigned?

Here is the code and comparisons using malloc and numpy's ndarray:

## test_malloc.pyx

cimport cython
import numpy as np
cimport numpy as np

from libc.stdlib cimport malloc, free

np.import_array()


@cython.wraparound(False)
@cython.boundscheck(False)
cdef int _num_assign_malloc():
    cdef:
        Py_ssize_t n = 50
        double *x = <double *> malloc(n * sizeof(double))
        Py_ssize_t i
        int k = 0

    x[5] = 1.
    x[8] = 1.
    x[38] = 1.

    for i in range(n):
        if x[i] == 1.:
            k += 1

    free(x)
    return k


@cython.wraparound(False)
@cython.boundscheck(False)
cdef int _num_assign_ndarray():
    cdef:
        Py_ssize_t n = 50
        double[::1] x = np.zeros(n, dtype=np.float64)
        Py_ssize_t i
        int k = 0

    x[5] = 1.
    x[8] = 1.
    x[38] = 1.

    for i in range(n):
        if x[i] == 1.:
            k += 1

    return k


cpdef test_num_malloc():
    return _num_assign_malloc()


cpdef test_num_ndarray():
    return _num_assign_ndarray()

timeit:

>>> import timeit
>>> t = timeit.Timer("from test_malloc import test_num_malloc; test_num_malloc()")
>>> t.timeit(10000)
# 0.0111
>>> t = timeit.Timer("from test_malloc import test_num_ndarray; test_num_ndarray()")
>>> t.timeit(10000)
# 0.0246

When I check what are those unassigned elements in malloced memoryview:

@cython.wraparound(False)
@cython.boundscheck(False)
cdef _malloc_unassign():
    cdef:
        Py_ssize_t n = 20
        double *x = <double *> malloc(n * sizeof(double))
        double[::1] y = np.zeros(n, dtype=np.float64)
        Py_ssize_t i

    x[5] = 1.
    x[8] = 1.

    for i in range(n):
        y[i] = x[i]

    free(x)
    return y


cpdef test_malloc():
    return _malloc_unassign()

I got:

>>> from test_malloc import test_malloc
>>> for t in test_malloc():
...    print(t)
...
    6.230420704259778e-307
    3.5604305343967845e-307
    1.6021930623879718e-306
    2.447635570273665e-307
    1.6911933005114767e-306
    1.0
    1.0570034520433751e-307
    1.2461038328174693e-306
    1.0
    8.066321387750432e-308
    1.201607109425611e-306
    1.691193300407861e-306
    1.2906222865963489e-306
    1.4241722150642528e-306
    1.3351264799003858e-306
    7.56596412080425e-307
    7.565984490202416e-307
    1.6021903463576915e-306
    1.424109743093715e-306
    2.2250738585072396e-306

Seems I can simply cheched whether it is assigned by x[i] == 1., but I don't think this is good enough, especially according to this answer, those unassigned elements are treated as random; and if I used int instead of double.

Edit


Thanks @DavidW for providing calloc. Adding this to my test_malloc.pyx:

@cython.boundscheck(False)
@cython.wraparound(False)
cdef int _num_assign_calloc():
    cdef:
        Py_ssize_t n = 50
        double *x = <double *> calloc(n, sizeof(double))
        Py_ssize_t i
        int k = 0

    x[5] = 1.
    x[8] = 1.
    x[38] = 1.

    for i in range(n):
        if x[i] == 1.:
            k += 1

    free(x)
    return k

timeit:

>>> import numpy as np
>>> time_malloc = []
>>> for i in range(1000):
...     t = timeit.Timer("from core.test_malloc import test_num_malloc; test_num_malloc()")
...     time_malloc.append(t.timeit(10000))
...
...
>>> time_ndarr = []
>>> for i in range(1000):
...     t = timeit.Timer("from core.test_malloc import test_num_ndarray; test_num_ndarray()")
...     time_ndarr.append(t.timeit(10000))
...
...
>>> time_calloc = []
>>> for i in range(1000):
...     t = timeit.Timer("from core.test_malloc import test_num_calloc; test_num_calloc()")
...     time_calloc.append(t.timeit(10000))
...
...
>>> print("mean: %.4f, std: %.4f" % (np.mean(time_malloc), np.std(time_malloc)))
mean: 0.0098, std: 0.0034
>>> print("mean: %.4f, std: %.4f" % (np.mean(time_calloc), np.std(time_calloc)))
mean: 0.0098, std: 0.0026
>>> print("mean: %.4f, std: %.4f" % (np.mean(time_ndarr), np.std(time_ndarr)))
mean: 0.0177, std: 0.0033

Solution

  • No. You cannot reliably work out if an element a malloced array has been assigned. That would require the C compiler to add a whole bunch of extra code and storage to track what has been written to.

    Try calloc, which does produce a zeroed array.

    It's worth emphasising that one of the reasons malloc is fast is because it skips the initialization so you may be after an impossible "best of both worlds" - all the advantages of initialization without any of the time cost. With that said, I'd expect calloc to beat Numpy.