Search code examples
pythonfortranctypes

Difference between array of c_char inside and outside a structure in python for fortran library


I'm interfacing a fortran library with python using c_types. I initialize structures in python, pass them to fortran who populates them, et read them back in python. Everything works fine with array of numbers but now I'm stuck with interfacing string arrays.

I've tried example like this one and this was ok, but in this case the c_char array is not in a structure. So I've tried to modify the previous example putting the c_char array inside a structure. Here is the code I've used, with and without the structure:

Python code:

    from ctypes import *
    lib = CDLL("./libf.so")

    if 1:
        print(">>> Without structure")
        func = getattr(lib, "fortran2py_")
        nstring = pointer(c_long(2))
        carr = (c_char * 255)()
        func.argtypes = [POINTER(c_long), POINTER(c_char)]

        print(type(carr))
        print('before:',carr)
        func(nstring, carr)
        str1, str2 = ''.join([v.decode("utf-8") for v in carr]).rstrip("\x00").split("\x00")
        print(str1, str2)


    class Struct0(Structure):
        _fields_ = [
            ("name", c_char * 255),
        ]

    if 1:    
        print(">>> With structure")
        func = getattr(lib, "fortran2pystr_")
        nstring = pointer(c_long(2))
        carr = Struct0()
        func.argtypes = [POINTER(c_long), POINTER(Struct0)]
        print(type(carr.name))
        print('before:',carr.name)
        func(nstring, byref(carr))
        print('after:',carr.name)

Fortran code:

    module c_interop

        use iso_c_binding
        implicit none
        integer, parameter :: STRLEN = 64

        type, bind(c) :: charStr
           character(c_char)  :: name(255)
        end type charStr

        contains

        subroutine fortran2py(nstring, cstring_p) bind(C, name="fortran2py_")
            integer(c_int), intent(in) :: nstring
            character(c_char), dimension(*), intent(inout) :: cstring_p
            integer :: i, j, ks, kf, n
            character(len=STRLEN) :: mystr(2)

            mystr(1) = "This is the first string."
            mystr(2) = "Wow. Fortran + Python + Strings = Pain !"
            ks = 1
            do i = 1, nstring
                n = len_trim(mystr(i))
                kf = ks + (n - 1)  
                cstring_p(ks:kf) = transfer(mystr(i)(1:n), cstring_p(ks:kf))
                cstring_p(kf + 1) = c_null_char
                ks = ks + n + 1
            enddo
        end subroutine fortran2py

        subroutine fortran2pystr(nstring, cstring_p) bind(C, name="fortran2pystr_")
            integer(c_int), intent(in) :: nstring
            type(charStr), intent(inout) :: cstring_p
            integer :: i, j, ks, kf, n
            character(len=STRLEN) :: mystr(2)

            mystr(1) = "This is the first string."
            mystr(2) = "Wow. Fortran + Python + Strings = Pain !"
            ks = 1
            do i = 1, nstring
                n = len_trim(mystr(i))
                kf = ks + (n - 1)  
                cstring_p%name(ks:kf) = transfer(mystr(i)(1:n), cstring_p%name(ks:kf))
                cstring_p%name(kf + 1) = c_null_char
                ks = ks + n + 1
            enddo
        end subroutine fortran2pystr

    end module c_interop

I get no error, except that in the modified part, Fortran should fill the array of c_char carr.name looping on the elements of mystr, but the resulting string contain only the first element. When carr is not a structure but directly the c_char array, python can read all the content of mystr.

Output:

>>> Without structure
<class '__main__.c_char_Array_255'>
before: <__main__.c_char_Array_255 object at 0x151b3b092bf8>
This is the first string. Wow. Fortran + Python + Strings = Pain !
>>> With structure
<class 'bytes'>
before: b''
after: b'This is the first string.'

As you can see the type of carr and carr.name are also not the same. Do you have any idea of what is wrong with my modified code ? Thank you !


Solution

  • Listing [Python.Docs]: ctypes - A foreign function library for Python.

    The cause it's a CTypes subtle behavior. c_char (and also c_wchar) arrays are silently converted to bytes (or str) when they are present as fields in a structure. This is being done via c_char_p (or c_wchar_p) which are NUL terminated, meaning that the "array" will be truncated if a NUL (0x00) char will be encountered, which is exactly your case. You can check that by looking at the field type.
    Don't know why this is (maybe to ease the usage), but there are cases when it does more harm than good. It can be reproduced with Python code only.

    code00.py

    #!/usr/bin/env python
    
    import ctypes as cts
    import sys
    
    
    ARR_DIM = 10
    CharArr = cts.c_char * ARR_DIM
    
    
    class CharArrStruct(cts.Structure):
        _fields_ = (
            ("data", CharArr),
        )
    
    
    def print_array(arr,  text, size=ARR_DIM):
        print(text)
        for i in range(size):
            print("{0:3d}".format(i), end=" - ")
            try:
                print(arr[i])
            except IndexError:
                print("IndexError!!!")
                break
        print()
    
    
    def main(*argv):
        arr = CharArr()
        sarr = CharArrStruct()
        print("Array (plain) type: {0:}".format(type(arr)))
        print("Array (in structure) type: {0:}".format(type(sarr.data)))
    
        string_separator = b"\x00"
        print("\nString separator: {0:}".format(string_separator))
        text = string_separator.join((b"abcd", b"efgh"))
        arr[0:len(text)] = text
        sarr.data = text
    
        print_array(arr, "Plain array:")
        print_array(sarr.data, "Structure with array:")
        print("Strings (in structure): {0:}".format(sarr.data.split(string_separator)))
    
        string_separator = b"\xFF"
        print("\nString separator: {0:}".format(string_separator))
        sarr.data = string_separator.join((b"abcd", b"efgh"))
    
        print_array(sarr.data, "Structure with array:")
        print("Strings (in structure): {0:}".format(sarr.data.split(string_separator)))
    
    
    if __name__ == "__main__":
        print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                       64 if sys.maxsize > 0x100000000 else 32, sys.platform))
        rc = main(*sys.argv[1:])
        print("\nDone.\n")
        sys.exit(rc)
    
    

    Output:

    e:\Work\Dev\StackOverflow\q060093054>"e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code00.py
    Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32
    
    Array (plain) type: <class '__main__.c_char_Array_10'>
    Array (in structure) type: <class 'bytes'>
    
    String separator: b'\x00'
    Plain array:
      0 - b'a'
      1 - b'b'
      2 - b'c'
      3 - b'd'
      4 - b'\x00'
      5 - b'e'
      6 - b'f'
      7 - b'g'
      8 - b'h'
      9 - b'\x00'
    
    Structure with array:
      0 - 97
      1 - 98
      2 - 99
      3 - 100
      4 - IndexError!!!
    
    Strings (in structure): [b'abcd']
    
    String separator: b'\xff'
    Structure with array:
      0 - 97
      1 - 98
      2 - 99
      3 - 100
      4 - 255
      5 - 101
      6 - 102
      7 - 103
      8 - 104
      9 - IndexError!!!
    
    Strings (in structure): [b'abcd', b'efgh']
    
    Done.
    

    Notes:

    • As seen, data field type was changed

    • The simplest solution that came into my mind was to replace the string separator from NUL to another char that you are sure it won't appear in any of your strings. I chose 0xFF (255). I think it would also be possible with structures containing ctypes.POINTER(ctypes.c_char), but it would be a bit more complex (also, I didn't test it)

    • My Fortran knowledge is very close to 0, but something doesn't look right with fortran2pystr. I don't know how Fortran types are structured, but passing a char array wrapped in a struct pointer (indeed, they have the same address) from Python and handling it like a plain char array seems wrong. Changing the struct, would potentially be a recipe for disaster