Search code examples
pythonpython-c-api

How to construct a Python object that has an underlying non-contiguous buffer (as obtained by `PyObject_GetBuffer`)?


The Python C API includes a function, PyBuffer_IsContiguous for verifying if the returned buffer is contiguous.

How to construct an object such as that function returns a false value?

Here are a few examples that do not work (all tried with Python 3.10):

a = b"123"
b = b"456"
c = bytearray(a[0:1] + b[0:1])
# This one returns a contiguous buffer even if id(b[0]) == id(c[1])
# Who does the copying? Why? Any rules?
np.zeros((2, 3)).T
# PyObject_GetBuffer fails instead of returning a strided array
memoryview(b"123")[::2]
# PyObject_GetBuffer fails instead of returning a strided array

Solution

  • c = bytearray(a[0:1] + b[0:1])
    # This one returns a contiguous buffer even if id(b[0]) == id(c[1])
    # Who does the copying? Why? Any rules?
    

    There are actually many different copied copies. a[0:1] and b[0:1] return new (i.e. copied) bytes objects, not views on the existing bytes objects. ... + ... also returns a new bytes object containing the combined contents. bytearray then also returns a new bytearray object with its own internal memory store.

    id(b[0]) == id(c[1]) tells you nothing - b[0] and c[1] return a reference to a Python int. That Python int does not share memory with the bytes object that generated it. What you're seeing is the results of an optimization - Python keep a cache of small ints and returns a reference to an object from that cache rather than creating a new one. So you get multiple references to the same int.


    Your other two suggestions are non-contiguous arrays. As suggested in the comments: PyObject_GetBuffer has a flags variable that controls what type of buffer it can return (and it'll raise an error it can't produce the right type of buffer). The documentation has three tables detailing the various flags, but largely anything with strides marked as "YES" should be able to handle a non-contiguous array. Your suggestion (in the comments) of PyBUF_STRIDES is probably the simplest thing that will.