I have a large generic Python object which I have no information about. I need to split this object into smaller chunks for storage needs.
Hope someone can help, Omer.
pickle
it and split the resulting data.
You can't serialize only "a part" of an object because there's no such thing in the general case as "a part of an object", you need knowledge of its internals to be able to split it into parts -- which you stated you don't have.
However, you can use pickle.dump
(that writes to a file-like object) and pass it a custom file-like object that would split the resulting data instead as it receives it.
E.g. here's a file-like object that would write data to files in 2GiB chunks (in the example, I set the chunk size to 4MiB instead):
class SplitFile(object):
def __init__(self, name_pattern, chunk_size=2*1024**3):
self.name_pattern = name_pattern
self.chunk_size = chunk_size
self.file = None
self.part = -1
self.offset = None
def write(self, bytes):
if not self.file: self._split()
while True:
l = len(bytes)
wl = min(l, self.chunk_size - self.offset)
self.file.write(bytes[:wl])
self.offset += wl
if wl == l: break
self._split()
bytes = bytes[wl:]
def _split(self):
if self.file: self.file.close()
self.part += 1
self.file = open(self.name_pattern % self.part, "wb")
self.offset = 0
def close(self):
if self.file: self.file.close()
def __del__(self):
self.close()
import random
big_object = [random.random() for _ in range(1000000)]
import pickle
dest = SplitFile("data.part%02d.pickle", 4*1024**2)
pickle.dump(big_object, dest)
After running the example, we have:
$ ls -l *.pickle
-rwxrwx---+ 1 Sasha None 4194304 Dec 4 16:02 data.part00.pickle
-rwxrwx---+ 1 Sasha None 4194304 Dec 4 16:02 data.part01.pickle
-rwxrwx---+ 1 Sasha None 4194304 Dec 4 16:02 data.part02.pickle
-rwxrwx---+ 1 Sasha None 4194304 Dec 4 16:02 data.part03.pickle
-rwxrwx---+ 1 Sasha None 4194304 Dec 4 16:02 data.part04.pickle
-rwxrwx---+ 1 Sasha None 294912 Dec 4 16:02 data.part05.pickle