Search code examples
pythonctypes

Python ctypes and mutability


I noticed that passing Python objects to native code with ctypes can break mutability expectations.

For example, if I have a C function like:

int print_and_mutate(char *str)
{
    str[0] = 'X';
    return printf("%s\n", str);
}

and I call it like this:

from ctypes import *
lib = cdll.LoadLibrary("foo.so")

s = b"asdf"
lib.print_and_mutate(s)

The value of s changed, and is now b"Xsdf".

The Python docs say "You should be careful, however, not to pass them to functions expecting pointers to mutable memory.".

Is this only because it breaks expectations of which types are immutable, or can something else break as a result? In other words, if I go in with the clear understanding that my original bytes object will change, even though normally bytes are immutable, is that OK or will I get some kind of nasty surprise later if I don't use create_string_buffer like I'm supposed to?


Solution

  • Python makes assumptions about immutable objects, so mutating them will definitely break things. Here's a concrete example:

    >>> import ctypes as c
    >>> x = b'abc'          # immutable string
    >>> d = {x:123}         # Used as key in dictionary (keys must be hashable/immutable)
    >>> d
    {b'abc': 123}
    

    Now build a ctypes mutable buffer to the immutable object. id(x) in CPython is the memory address of the Python object and sys.getsizeof() returns the size of that object. PyBytes objects have some overhead, but the end of the object has the bytes of the string.

    >>> sys.getsizeof(x)
    36
    >>> px=(c.c_char*36).from_address(id(x))
    >>> px.raw
    b'\x02\x00\x00\x00\x00\x00\x00\x000\x8fq\x0b\xfc\x7f\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\xf0\x06\xe61\xeb\x00\x1b\xa9abc\x00'
    >>> px.raw[-4:]  # last bytes of the object
    b'abc\x00'
    >>> px[-4]
    b'a'
    >>> px[-4] = b'y'  # Mutate the ctypes buffer, mutating the "immutable" string
    >>> x              # Now it has a modified value.
    b'ybc'
    

    Now try to access the key in the dictionary. Keys are located in O(1) time using its hash, but the hash was on the original, "immutable" value so it is incorrect. The key can no longer be found by old or new value:

    >>> d           # Note that dictionary key changed, too.
    {b'ybc': 123}
    >>> d[b'ybc']   # Try to access the key
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: b'ybc'
    >>> d[b'abc']   # Maybe original key will work? It hashes same as the original...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: b'abc'