Search code examples
pythonmultiprocess

Implementing Multicore Threading on this algorithm


I'm trying to find a way to make the following algorithm being processed on multiple cores, but I don't get on a good point. Using a locked iterator shared between multiple processes wouldn't be the most efficient way I think.

 def sortCharset(set):
   _set = ""
   for c in set:
     if c not in _set:
       _set += c
   set = _set
   del _set
   set = list(set)
   set.sort()
   return "".join(set)
 
 
 def stringForInt(num, set, length):
   setLen = len(set)
   string = ""
   string += set[num % setLen]
   for n in xrange(1,length):
     num //= setLen
     string += set[num % setLen]
   return string
 
 
 def bruteforce(set, length, raw = False):
   if raw is False:
     set = sortCharset(set)
 
   for n in xrange(len(set) ** length):
     yield stringForInt(n, set, length)

Short explanation: The code is used to create every possible combination from a set of chars, i.e. to hack a password. (Of course not my intention, just some Py-training. ;-)

What is a good way to run this algorithm on multiple cores ?


Solution

  • The question isn't really about naming style or how to get a sorted set of characters out of a string.

    You might want to look into the multiprocessing module. I'm pretty much a n00b w/r/t multi-core parallelism but got something working:

    import multiprocessing, itertools
    
    def stringForInt(args):
        num, charset, length = args ## hack hack hack
        setlen = len(charset)
        s = []
        s.append(charset[num % setlen])
        for n in xrange(1, length):
            num //= setlen
            s.append(charset[num % setlen])
        return ''.join(s)
    
    def bruteforce(charset, length, mapper, raw=False):
        if not raw:
            charset = sorted(set(charset))
        return mapper(stringForInt, ((n,charset,length) for n in xrange(len(charset)**length)))
    
    if __name__ == '__main__':
        import time, sys
        if len(sys.argv) == 1 or sys.argv[1] == 'map':
            mapper = map
        else:
            p = multiprocessing.Pool()
            pfunc = {'pmap':p.map,
                     'imap':p.imap,
                     'imapu':p.imap_unordered}[sys.argv[1]]
            mapper = lambda f, i: pfunc(f, i, chunksize=5)
        o = bruteforce('abcdefghijk',6,mapper)
        if not isinstance(o, list):
            list(o)
    

    The nature of the hack is that you need to use pickleable objects for the functions in multiprocessing and only functions that are defined at the top-level are can be pickled. (There would be other ways around this using multiprocessing.Value or multiprocessing.Manager but they aren't really worth going into for present purposes.)

    Here's output for various runs:

    $ for x in map pmap imap imapu ; do time python mp.py $x; done
    
    real    0m9.351s
    user    0m9.253s
    sys     0m0.096s
    
    real    0m10.523s
    user    0m20.753s
    sys     0m0.176s
    
    real    0m4.081s
    user    0m13.797s
    sys     0m0.276s
    
    real    0m4.215s
    user    0m14.013s
    sys     0m0.236s