Search code examples
pythonio

How to pass file, BytesIO, StringIO in Python to be used later?


These three memory or disk buffer follow the same access pattern. I'm going to focus BytesIO.

How do I pass in a file or buffer object to be used later? I'm having a lot of trouble with the following use case:

def get_file_and_metadata():
  metadata = {"foo": "bar"}

  with io.BytesIO() as f: 
   f.write(b'content')
   f.seek(0)
   return f, metadata

f, metadata = get_file_and_metadata()

# Do something with file 
pd.read_csv(f, encoding="utf-8")

I suspect is because f.close() is ran after return statement.


Solution

  • close is run when the with suite terminates. If you want to pass back an open file-like object, you should not open it in a with. One option is to just drop the context manager completely and leave it up to the caller to clean up the object.

    def get_file_and_metadata():
        metadata = {"foo": "bar"}
        f = o.BytesIO() 
        f.write(b'content')
        f.seek(0)
        return f, metadata
    
    f, metadata = get_file_and_attr()
    try:
        # Do something with file 
        pd.read_csv(f, encoding="utf-8")
    finally:
        f.close()
    

    This is a reasonable thing to do any time a file object is passed through some sort of pipeline or used in an order that makes context managers inconvenient. 99% of the time files are closed when the object deleted anyway, at least in cpython.

    Or you could write your own context manager

    import contextlib
    
    @contextlib.contextmanager
    def get_file_and_metadata():
        metadata = {"foo": "bar"}
        f = o.BytesIO() 
        f.write(b'content')
        f.seek(0)
        try:
            yield f, metadata
        finally:
            f.close()
    
    with get_file_and_attr() as f, metadata:
        # Do something with file 
        pd.read_csv(f, encoding="utf-8")
    

    From your comment I realized that the metadata could just go on the BytesIO object and then its context manager is available.

    import io
    
    def get_file_and_metadata():
        metadata = {"foo": "bar"}
        f = io.BytesIO()
        f.write(b'content')
        f.seek(0)
        f.metadata = metadata
        return f
    
    with get_file_and_metadata() as f:
        pd.read_csv(f, encoding="utf-8")