Search code examples
pythonzipfilesplitting

Split a zip-file into chunks with Python


I have a piece of a code which creates a zip file successfully, but I need to split this file if the size of it is more that 1MB.

I have this code but it doesn't work:

    from split_file_reader.split_file_writer import SplitFileWriter
    import zipfile

    # make element tree
    tree = etree.ElementTree(batch_element)

    # make xml file and write it in stream
    xml_object = BytesIO()
    tree.write(xml_object, pretty_print=True, xml_declaration=False, encoding="utf-8")
    xml_file = xml_object.getvalue()

    final = BytesIO()

    with SplitFileWriter(final, 1_000_000) as sfw:
        with zipfile.ZipFile(sfw, "a") as zip_file:
            zip_file.writestr('Batch.xml', xml_file)

I want to retrieve the split file as bytes. The zipping part is working, but the splitting doesn't.


Solution

  • According to the split_file_reader docs, the first argument of SplitFileWriter can be a generator that produces file-like objects. That will allow you to split the zip-file into a list of BytesIO chunks.

    Here is a working example script:

    import zipfile
    from io import BytesIO
    from lxml import etree
    from split_file_reader.split_file_writer import SplitFileWriter
    
    # make element tree
    # tree = etree.ElementTree(batch_element)
    tree = etree.parse('/tmp/test.xml')
    
    # make xml file and write it in stream
    xml_object = BytesIO()
    tree.write(xml_object, pretty_print=True, xml_declaration=False, encoding="utf-8")
    xml_file = xml_object.getvalue()
    
    chunks = []
    
    def gen(lst):
        while True:
            lst.append(BytesIO())
            yield lst[-1]
    
    with SplitFileWriter(gen(chunks), 1_000_000) as sfw:
        with zipfile.ZipFile(sfw, "w") as zip_file:
            zip_file.writestr('Batch.xml', xml_file)
    
    for i, chunk in enumerate(chunks):
        print(f'chunk {i}: {len(chunk.getvalue())}')
    

    Output:

    chunk 0: 1000000
    chunk 1: 1000000
    chunk 2: 1000000
    chunk 3: 1000000
    chunk 4: 1000000
    chunk 5: 887260