Search code examples
pythonrustctypesaccess-violationffi

Access violation when calling Rust function through FFI


As the title states, I get an access violation when I try to call the following Rust code in Python.

Here's the Rust code:

#![crate_type = "dylib"]

extern crate libc;

use libc::c_char;
use std::ffi::CStr;
use std::str;

#[repr(C)]
pub struct AdditionalDetail {
    swis: String,
    sbl: String,
    school_code: String,
    land_assessed_value: u32,
    deed_book: String,
    deed_page: String,
}

#[no_mangle]
pub extern fn parse_details(l: *const c_char) -> AdditionalDetail{
    let _line = unsafe {
        assert!(!l.is_null());
        CStr::from_ptr(l)
    };
    let line = str::from_utf8(_line.to_bytes()).unwrap();
    let _swis = line[52..58].to_owned();
    let _sbl = line[58..78].to_owned();
    let _school_code = line[371..377].to_owned();
    let _land_assessed_value = line[824..836].parse::<u32>().ok().expect("Couldn't convert to an int");
    let _deed_book = line[814..819].to_owned();
    let _deed_page = line[819..824].to_owned();
    AdditionalDetail{swis: _swis, sbl: _sbl, school_code: _school_code, deed_page: _deed_page,
                     land_assessed_value: _land_assessed_value, deed_book: _deed_book}
}

And the Python code I'm using to call it:

from ctypes import cdll, c_uint32, Structure, c_char_p


class TaxDetail(Structure):
    _fields_ = [('swis', c_char_p),
                ('sbl', c_char_p),
                ('school_code', c_char_p),
                ('land_assessed_value', c_uint32),
                ('deed_book', c_char_p),
                ('deed_page', c_char_p), ]

    def __str__(self):
        return str(self.swis)


lib = cdll.LoadLibrary(r"C:\Rust Workspace\embed\target\release\embed.dll")
lib.parse_details.restype = TaxDetail
lib.parse_details.argtype = (c_char_p,)
result = lib.parse_details(b"1346011          63 WAP WEST  LLC    00000101       13460100615800142703690000  63 Wap West  LLC              10 Fair Oaks Dr               Poughkeepsie, NY 12603                                                                                                                                                                                            000500000150000000017135601       14270369   411      000001 1        4-6Church St                            0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006158-14-270369-0000      058369006127002200002074000000052500000000286000N    0000000028600000000000000000000000000000000000Y")
print(result)

I've added println! calls to my Rust code, and the access violation seems to occur when it tries to create and return the struct. The specific error message I'm getting is Process finished with exit code -1073741819 (0xC0000005).

This happens with 32 bit Rust and Python on 64 bit Windows 10.


Solution

  • I'm not sure of the full extent of the issues, but I know this one isn't going to be good: you can not return a String through FFI.

    A Rust String is conceptually 3 parts: a pointer to a chunk of memory, how long that memory is, and how much of that memory is a valid string.

    Compare that to a C string. A C string is just a pointer to memory. You don't know how much memory there is, and you only know the valid length by walking down every single byte until you get to a NUL byte.

    Even more than that, a String isn't marked as #[repr(C)], so the actual layout of the String structure is up to the Rust compiler.

    I suspect that the error occurs because Python sees you are returning a c_char_p (which I assume is a char *). It then tries to read a pointer's worth of data and then moves to the next pointer. The "pointer" it reads may be the String's pointer or length or capacity, and as soon as it reads the second one it's off in the weeds somewhere.

    Instead, you will need to figure out alternative ways of dealing with this string. A few thoughts:

    1. Manipulate the passed-in string to add NUL bytes at the break points, then return pointers into that large chunk. You need to be careful to not use any of the substrings after the original string has been freed. Also the original string will now look shorter as it has embedded NUL bytes. I also don't know when Python will free the string.
    2. Return an object that holds onto a CString and has methods that return the result of as_ptr.

    A similar logic applies for &str, which is conceptually a pointer to a chunk of memory and how much of that memory is valid.