I'm trying to find a way to make the following algorithm being processed on multiple cores, but I don't get on a good point. Using a locked iterator shared between multiple processes wouldn't be the most efficient way I think.
def sortCharset(set):
_set = ""
for c in set:
if c not in _set:
_set += c
set = _set
del _set
set = list(set)
set.sort()
return "".join(set)
def stringForInt(num, set, length):
setLen = len(set)
string = ""
string += set[num % setLen]
for n in xrange(1,length):
num //= setLen
string += set[num % setLen]
return string
def bruteforce(set, length, raw = False):
if raw is False:
set = sortCharset(set)
for n in xrange(len(set) ** length):
yield stringForInt(n, set, length)
Short explanation: The code is used to create every possible combination from a set of chars, i.e. to hack a password. (Of course not my intention, just some Py-training. ;-)
What is a good way to run this algorithm on multiple cores ?
The question isn't really about naming style or how to get a sorted set of characters out of a string.
You might want to look into the multiprocessing module. I'm pretty much a n00b w/r/t multi-core parallelism but got something working:
import multiprocessing, itertools
def stringForInt(args):
num, charset, length = args ## hack hack hack
setlen = len(charset)
s = []
s.append(charset[num % setlen])
for n in xrange(1, length):
num //= setlen
s.append(charset[num % setlen])
return ''.join(s)
def bruteforce(charset, length, mapper, raw=False):
if not raw:
charset = sorted(set(charset))
return mapper(stringForInt, ((n,charset,length) for n in xrange(len(charset)**length)))
if __name__ == '__main__':
import time, sys
if len(sys.argv) == 1 or sys.argv[1] == 'map':
mapper = map
else:
p = multiprocessing.Pool()
pfunc = {'pmap':p.map,
'imap':p.imap,
'imapu':p.imap_unordered}[sys.argv[1]]
mapper = lambda f, i: pfunc(f, i, chunksize=5)
o = bruteforce('abcdefghijk',6,mapper)
if not isinstance(o, list):
list(o)
The nature of the hack is that you need to use pickleable objects for the functions in multiprocessing
and only functions that are defined at the top-level are can be pickled. (There would be other ways around this using multiprocessing.Value
or multiprocessing.Manager
but they aren't really worth going into for present purposes.)
Here's output for various runs:
$ for x in map pmap imap imapu ; do time python mp.py $x; done
real 0m9.351s
user 0m9.253s
sys 0m0.096s
real 0m10.523s
user 0m20.753s
sys 0m0.176s
real 0m4.081s
user 0m13.797s
sys 0m0.276s
real 0m4.215s
user 0m14.013s
sys 0m0.236s