I have a big binary file (60GB) that I want to split into several smaller. I iterated over the file and found the points at which I want to split the file using fileObject.tell()
method, so now I have an array of 1000 split points called file_pointers
. I am looking for a way to create files out of those split points, so the function would look like:
def split_file(file_object, file_pointers):
# Do something here
and it would create files for every chunk. I saw this question, but I am afraid Python's looping could be too slow, and I also feel like there must be some kind of a built-in function that should something similar.
This is a lot simpler than I thought, but I will post my answer in here just in case anyone wants a quick solution. Here is an example of copying from file_pointer[1]
to file_pointer[2]
with open('train_example.bson', 'rb') as fbson:
fbson.seek(file_pointers[1])
bytes_chunk = fbson.read(file_pointers[2] - file_pointers[1])
with open('tmp.bson', 'wb') as output_file:
output_file.write(bytes_chunk)