So I am writing this program which creates a picture of the mandelbrot set, and I have been incrementally making it better. right now, each process that is spawned writes some data to a temporary file, which is used later on to put the picture together. Right now, however, the temporary files are ALOT bigger than the actual picture is itself, and I don't have have any ideas on how to make them smaller. How do I efficiently write integer data to a file, and get it back? I intend to eventual make this very scalable, so I would need to be able to write arbitrarily long integers for the pixel indices, but the color data is always going to be three integers that have a max value of 255. Here is my code:
import multiprocessing
def pixproc(y0, yn, xsteps, ysteps, fname):
XMIN, YMIN = -2., -1.
XLEN, YLEN = 3, 2
with open(fname, 'w') as f:
for y in xrange(y0, yn):
print y
for x in xrange(xsteps):
c=complex(XMIN + XLEN*(1.*x/xsteps),
YMIN + YLEN*(1.*y/ysteps))
k=c
for i in xrange(256):
k = k*k + c
if abs(k)>2: break
if 0<i<32:
#print 'Success!', i
print >>f, x, y, 8*i, 0, 0 #This is that part of
if 32<=i<255: #my code that I am trying
#print 'Success!', i #to improve. The rest of
print >>f, x, y, 255, i, i #the code is given for context
return #and isn't relevant to my question
def main(xsteps, ysteps):
pool = multiprocessing.Pool()
n = multiprocessing.cpu_count()
step = height / n
fnames = ["temp" + str(i) for i in xrange(n)]
for i in xrange(n):
pool.apply_async(pixproc,
(step*i,
step*(i+1),
xsteps,
ysteps,
fnames[i]))
pool.close()
pool.join()
return fnames
if __name__=="__main__":
from PIL import Image
import sys
width, height = map(int, sys.argv[1:])
picname = "mandelbrot1.png"
fnames = main(width, height)
im = Image.new("RGB", (width, height))
pp = im.load()
for name in fnames:
with open(name) as f:
for line in f:
line = map(int, line.rstrip('\n').split(' '))
pp[line[0], line[1]] = line[2], line[3], line[4]
im.save(picname)
When I try to make a picture that is 3000x2000, the actual picture is 672 KB, but the temporary files are both close to 30 MB! Can someone suggest a better way to store the data in files? (The important part is in the function pixproc)
Assuming you're just trying to eliminate the overhead of using a text-based format instead of a binary format for your temporary data, and you don't want to rewrite everything to use numpy, there are a few different solutions:
First, you can keep the data in binary format in the first place: mmap
the file, and use ctypes
to treat it as a giant record of some kind. This is usually more trouble than it's worth, but it's worth mentioning.
Assuming your data is nothing but a long list of tuples of 5 bytes:
class Entry(ctypes.Structure):
_fields_ = [("x", ctypes.c_uint8), ("y", ctypes.c_uint8),
("i", ctypes.c_uint8), ("j", ctypes.c_uint8), ("k", ctypes.c_uint8)]
Entries = ctypes.POINTER(Entry)
with open(fname, 'wb') as f:
f.truncate(ctypes.sizeof(Entry * (yn - y0)))
m = mmap.mmap(f.fileno(), access=mmap.ACCESS_WRITE)
Second, you can use struct
. You'll have to read the docs for complete details, but I'll give one example. Let's take this line:
print >>f, x, y, 8*i, 0, 0
Now, let's assume that all 5 of those are guaranteed to be bytes (0-255). You can just do:
f.write(struct.pack('BBBBB', x, y, 8*i, 0, 0))
To read them back later:
x, y, i8, 0, 0 = struct.unpack('BBBBB', f.read(struct.calcsize('BBBBB')))
i = i8//8
If any of them needs to be longer than a byte, you need to deal with endianness, but that's pretty trivial. For example, if x
and y
range from -32768 to 32767:
f.write(struct.pack('>hhBBB', x, y, 8*i, 0, 0))
And make sure to open the file in binary mode.
And you can of course combine this with mmap
if you want, which means you can just use the struct.pack_into
and struct.unpack_from
instead of explicitly using pack
plus write
and unpack
plus read
.
Next, there's pickle
. Either directly create your list and just pickle.dump
it, or manually pickle.dumps
each entry and add some simple higher-level structure above that (or just use shelve
, if that higher-level structure is, or could be, a simple mapping from keys to entries). This may be larger instead of smaller, and it may be slower, so you always want to do some testing before considering this. But sometimes it's a simple solution.
Finally, you can probably come up with a more compact text format than just printing the str
representation of each object. This is usually not worth the effort, but again, it's worth thinking about.