Search code examples
pythonfilebufferbinaryfiles

How do I copy a chunk of a binary file in Python?


I have a big binary file (60GB) that I want to split into several smaller. I iterated over the file and found the points at which I want to split the file using fileObject.tell() method, so now I have an array of 1000 split points called file_pointers. I am looking for a way to create files out of those split points, so the function would look like:

def split_file(file_object, file_pointers):
     # Do something here

and it would create files for every chunk. I saw this question, but I am afraid Python's looping could be too slow, and I also feel like there must be some kind of a built-in function that should something similar.


Solution

  • This is a lot simpler than I thought, but I will post my answer in here just in case anyone wants a quick solution. Here is an example of copying from file_pointer[1] to file_pointer[2]

    with open('train_example.bson', 'rb') as fbson:
        fbson.seek(file_pointers[1])
        bytes_chunk = fbson.read(file_pointers[2] - file_pointers[1])
        with open('tmp.bson', 'wb') as output_file:
            output_file.write(bytes_chunk)