Search code examples
pythonfile-iocompressiongzipbzip2

Using python print() with a bzip2/gzip2/etc stream


I've read the documentation on how to write strings into a compressed python file:

with bz2.open ( "/tmp/test.bz2", "w" ) as f:
  f.write ( b"Hello" )

The problem I have is that I've functions accepting a file parameter, which is passed to the print() function, ie:

def produce_out ( out = sys.stdout ):
  # many print ( file = out )
  # invocations of other functions accepting out

Clearly, the cleanest and most modular way to obtain that my output is printed and compressed at the same time would be chaining the two above, ie:

with bz2.open ( "/tmp/test.bz2", "w" ) as f:
  out = compressed_stream_adapter ( f )
  produce_out ( out )

where compressed_stream_adapter() yields some object compatible with the file parameter that print() accepts and which automatically forwards the strings it receives to the compressed stream. This is how the compression works in Java, or how you can use the pipe operator in Linux shells to compress any kind of output (which also parallelises its endpoints, but that's not very important here).

My question is: does anything like compressed_stream_adapter() exist in python? Is there a different way to do it that does not require to change existing code?

Note that I already know I could do: out = io.StringIO () and later: f.write ( out.getvalue ().encode () ). However, that's not good when I have to dynamically dump big amounts of data to files (which indeed, is why I want to compress them).


Solution

  • Answering myself: I guess there isn't any off-the-shelf way to do that.

    So, I've followed the Dan Mašek comments and implemented a little wrapper, which relies on the fact that print() expects an object having a write method:

    class BinaryWriter:
        def __init__ ( self, bin_out, encoding = "utf-8", errors = 'strict' ):
            self.bin_out = bin_out
            self.encoding = encoding
            self.errors = errors
            
        def write ( self, s: str ):
            self.bin_out.write ( s.encode ( self.encoding, self.errors ) )
    
        def close ( self ):
            self.bin_out.close ()
    
    

    Usage:

    with bz2.open ( file_path, "w" ) as bout
        out = BinaryWriter ( bout )
        print ( "Hello, world", file = out )
        my_output ( out ) # Uses print( ..., file = out )
    

    If compression is optional:

    out = open ( file_path, mode = "w" ) if not file_path.endswith ( ".bz2" ) \
                else BinaryWriter ( bz2.open ( file_path, "w" ) )
    try:
        my_output ( out )
    finally:
        out.close ()