Search code examples
pythoncfile-descriptor

C `FILE` stream from Python BufferedIO object


I am writing a Python binding for a C library function that requires a FILE * handle as an input.

I want the Python caller to pass an open io.BufferedReader object to the function, so as to retain control of the handle, e.g.:

with open(fname, 'rb') as fh:
    my_c_function(fh)

Therefore, I don't want to pass a file name and open the handle inside the C function.

My C wrapper would roughly look like this:

PyObject *my_c_function (PyObject *self, PyObject *args)
{
    FILE *fh;
    if (! PyArgs_ParseTuple (args, "?", &fh)) return NULL;
    my_c_lib_function (fh);
    // [...]
}

Obviosuly I can't figure out what symbol I should use for "?", or whether I should use a different method than PyArgs_ParseTuple. The Python C API documentation does not seem to provide any example on how to deal with buffered IO objects (from what I understand, the Buffer protocol applies to bytes objects and co.... right?)

It seems like I could look into the file descriptor of the Python handle object within my C wrapper (as if calling fileno()) and create a C file handle from that using fdopen().

A couple of questions:

  1. Is this the most convenient way? Or is there a built-in method in the Python C API that I did not see?
  2. The fileno() documentation mentions: "Return the underlying file descriptor (an integer) of the stream if it exists. An OSError is raised if the IO object does not use a file descriptor." In which case would that happen? What if I pass a file handle created in Python by other than open()?
  3. It seems pretty safe to open a read-only C handle on a read-only fd opened by Python, which should be guaranteed to close the handle after the C function; however, can anybody think of any pitfalls to this approach?

Solution

  • Not sure if this is the most reasonable way, but I resolved it in Linux in the following way:

    static PyObject *
    get_fh_from_python_fh (PyObject *self, PyObject *args)
    {
        PyObject *buf, *fileno_fn, *fileno_obj, *fileno_args;
        if (! PyArg_ParseTuple (args, "O", &buf)) return NULL;
    
        // Get the file descriptor from the Python BufferedIO object.
        // FIXME This is not sure to be reliable. See
        // https://docs.python.org/3/library/io.html#io.IOBase.fileno
        if (! (fileno_fn = PyObject_GetAttrString (buf, "fileno"))) {
            PyErr_SetString (PyExc_TypeError, "Object has no fileno function.");
            return NULL;
        }
        fileno_args = PyTuple_New(0);
        if (! (fileno_obj = PyObject_CallObject (fileno_fn, fileno_args))) {
            PyErr_SetString (PyExc_SystemError, "Error calling fileno function.");
            return NULL;
        }
        int fd = dup (PyLong_AsSize_t (fileno_obj));
    
        /*
         * From the Linux man page:
         *
         * > The file descriptor is not dup'ed, and will be closed when the stream
         * > created by fdopen() is closed. The result of applying fdopen() to a
         * > shared memory object is undefined.
         *
         * EDIT: `fd` was already duplicated from the original file
         * descriptor, so `fh` can (must) be closed manually with
         * fclose().
         */
        FILE *fh = fdopen (fd, "r");
    
        // rest of the code...
    }
    

    This only has Linux in mind but so far it does what it needs to do. A better approach would be to gain insight into the BufferedReader object and maybe even find a FILE * in there; but if that is not part of the Python API it might be subject to breaking in future versions.