I am not at all familiar with Python, I usually do Ruby or JS. But I need to write a benchmarking script on a system that runs Python. What I'm trying to do is create a small script that gets a file size and the number of threads and write a random buffer. This is what I got after 2 hours of fiddling:
from multiprocessing import Pool
import os, sys
def writeBuf(buf):
def write(n):
f = open(os.path.join(directory, 'n' + str(n)), 'w')
try:
f.write(buf)
f.flush()
os.fsync(f.fileno)
finally:
f.close()
return write
if __name__ == '__main__':
targetDir = sys.argv[1]
numThreads = int(sys.argv[2])
numKiloBytes = int(sys.argv[3])
numFiles = int(102400 / numKiloBytes)
buf = os.urandom(numKiloBytes * 1024)
directory = os.path.join(targetDir, str(numKiloBytes) + 'k')
if not os.path.exists(directory):
os.makedirs(directory)
with Pool(processes=numThreads) as pool:
pool.map(writeBuf(buf), range(numFiles))
But it throws the error: AttributeError: Can't pickle local object 'writeBuf.<locals>.write'
I have tried before to use write
without the closure, but I got an error when I tried to define the function inside the __name__ == '__main__'
part. Omitting that if
also lead to an error and I read that it is required for Pool
to work.
What's supposed to be just a tiny script turned into a huge ordeal, can anyone point me the right way?
In theory, python can't pickle functions. (for details, see Can't pickle Function)
In practice, python pickles a function's name and module so that passing a function will work. In your case though, the function that you're trying to pass is a local variable returned by writeBuf
.
Instead:
writeBuf
wrapper.write
function's closure (buf
and directory
), instead give write
everything it needs as a parameter.Result:
def write(args):
directory, buf, n = args
with open(os.path.join(directory, 'n' + str(n)), 'w') as f:
# might as well use with-statements ;)
f.write(buf)
f.flush()
os.fsync(f.fileno)
if __name__ == '__main__':
...
with Pool(processes=numThreads) as pool:
nargs = [(directory, buf, n) for n in range(numFiles)]
pool.map(write, nargs)