Search code examples
pythonoptimizationpermutationpython-itertoolspypy

Optimize permutations search loop (can't use itertools) that is extremely slow. Any suggestions?


This is a game where you have 12 cards and you pick you until you choose 3 from the same group. I am attempting to find the probability of choosing each group. The script that I have created works, but it is extremely slow. My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes. I am just trying to figure out why. Any ideas would be greatly appreciated.

from collections import Counter
import pandas as pd
from datetime import datetime

weight = pd.read_excel('V01Weights.xlsx')

Weight looks like the following:

Symb    Weight
Grand   170000
Grand   170000
Grand   105
Major   170000
Major   170000
Major   215
Minor   150000
Minor   150000
Minor   12000
Bonus   105000
Bonus   105000
Bonus   105000

Max Picks represents the total number of different "cards". Total Picks represents the max number of user choices. This is because after 8 choices, you are guaranteed to have 2 of each type so on the 9th pick, you are guaranteed to have 3 matching.

TotalPicks = 9
MaxPicks = 12

This should have been named PickedProbabilities.

Picks = {0:0,1:0,2:0,3:0}

This is my simple version of the timeit class because I don't like the timeit class

def Time_It(function):
    start =datetime.now()

    x = function()

    finish = datetime.now()

    TotalTime = finish - start

    Minutes = int(TotalTime.seconds/60)
    Seconds = TotalTime.seconds % 60


    print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')

    return(x)

Given x(my picks in order) I find the probability. These picks are done without replacement

def Get_Prob(x,weight):
    prob = 1
    weights = weight.iloc[:,1]

    for index in x:
        num = weights[index]
        denom = sum(weights)

        prob *= num/denom
        weights.drop(index, inplace = True)
        # print(weights)

    return(prob)

This is used to determine if there are duplicates in my loop because that is not allowed

def Is_Allowed(x):
    return(len(x) == len(set(x)))

This determines if a win is present in all of the cards present thus far.

def Is_Win(x):
    global Picks

    WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]

    IsWin = False

    for index,item in enumerate(WinTypes):
        # print(index)
        if set(item).issubset(set(x)):
            IsWin = True
            Picks[index] += Get_Prob(x,weight)
            # print(Picks[index])
            print(sum(Picks.values()))
            break

    return(IsWin)

This is my main function that cycles through all of the cards. I attempted to do this using recursion but I eventually gave up. I can't use itertools to create all of the permutations because for example [0,1,2,3,4] will be created by itertools but this is not possible because once you get 3 matching, the game ends.

def Cycle():

    for a in range(MaxPicks):
        x = [a]

        for b in range(MaxPicks):
            x = [a,b]

            if Is_Allowed(x):
                for c in range(MaxPicks):
                    x = [a,b,c]
                    if Is_Allowed(x):                    
                        if Is_Win(x):
                            # print(x)
                            continue
                        for d in range(MaxPicks):
                            x = [a,b,c,d]
                            if Is_Allowed(x):
                                if Is_Win(x):
                                    # print(x)
                                    continue
                            for e in range(MaxPicks):
                                x = [a,b,c,d,e]
                                if Is_Allowed(x):
                                    if Is_Win(x):
                                        continue
                                for f in range(MaxPicks):
                                    x = [a,b,c,d,e,f]
                                    if Is_Allowed(x):
                                        if Is_Win(x):
                                            continue
                                    for g in range(MaxPicks):
                                        x = [a,b,c,d,e,f,g]
                                        if Is_Allowed(x):
                                            if Is_Win(x):
                                                continue
                                        for h in range(MaxPicks):
                                            x = [a,b,c,d,e,f,g,h]
                                            if Is_Allowed(x):
                                                if Is_Win(x):
                                                    continue
                                            for i in range(MaxPicks):
                                                if Is_Allowed(x):
                                                    if Is_Win(x):
                                                        continue

Calls the main function

x = Time_It(Cycle)
print(x)

writes the probabilities to a text file

with open('result.txt','w') as file:
    # file.write(pickle.dumps(x))
    for item in x:
        file.write(str(item) + ',' + str(x[item]) + '\n')

Solution

  • Ok, this time I hope I got your problem right:)

    There are two insights (I guess you have them, just for the sake of the completeness) needed in order to speed up your program algorithmically:

    1. The probabilities for the sequence (card_1, card_2) and (card_2, card_1) are not equal, so we cannot use the results from the urn problem, and it looks like we need to try out all permutations.
    2. However, given a set of cards we picked so far, we don't really need the information in which sequence they where picked - it is all the same for the future course of the game. So it is enough to use dynamic programming and calculate the probabilities for every subset to be traversed during the game (thus we need to check 2^N instead of N! states).

    For a set of picked cards set the probability to pick a card i in the next turn is:

     norm:=sum Wi for i in set 
     P(i|set)=Wi/norm if i not in set else 0.0
    

    The recursion for calculating P(set) - the probability that a set of picked card occured during the game is:

    set_without_i:=set/{i}
    P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set
    

    However this should be done only for set_without_i for which the game not ended yet, i.e. no group has 3 cards picked.

    This can be done by means of recursion+memoization or, as my version does, by using bottom-up dynamic programming. It also uses binary representation of integers for representations of sets and (most important part!) returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]:

    #calculates probability to end the game with 3 cards of a type
    
    
    N=12
    
    #set representation int->list
    def decode_set(encoded):
        decoded=[False]*N
        for i in xrange(N):
            if encoded&(1<<i):
                decoded[i]=True
        return decoded
    
    weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]     
    def get_probs(decoded_set):
        denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
        return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]
    
    def end_group(encoded_set):
        for i in xrange(4):
           whole_group =  7<<(3*i) #7=..000111, 56=00111000 and so on
           if (encoded_set & whole_group)==whole_group:
               return i
        return None
    
    
    #MAIN: dynamic program:
    
    MAX=(1<<N)#max possible set is 1<<N-1
    probs=[0.0]*MAX
    
    #we always start with the empty set:
    probs[0]=1.0    
    #building bottom-up
    for current_set in xrange(MAX):
        if end_group(current_set) is None:  #game not ended yet!
           decoded_set=decode_set(current_set)
           trans_probs=get_probs(decoded_set)
           for i, is_set in enumerate(decoded_set):
               if not is_set:
                  new_set=current_set | (1<<i) 
                  probs[new_set]+=probs[current_set]*trans_probs[i]
    
    #filtering wins:
    group_probs=[0.0]*4
    for current_set in xrange(MAX):
       group_won=end_group(current_set)
       if group_won is not None:
          group_probs[group_won]+=probs[current_set]
    
    
    print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)
    

    Some explanation of the "tricks" used in code:

    A pretty standard trick is to use integer's binary representation to encode a set. Let's say we have objects [a,b,c], so we could represent the set {b,c} as 110, which would mean a (first in the list corresponds to 0- the lowest digit) - not in the set, b(1) in the set, c(1) in the set. However, 110 read as integer it is 6.

    The current_set - for loop simulates the game and best understood while playing. Let's play with two cards [a,b] with weights [2,1].

    We start the game with an empty set, 0 as integer, so the probability vector (given set, its binary representation and as integer mapped onto probability):

      probs=[{}=00=0->1.0,  01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]
    

    We process the current_set=0, there are two possibilities 66% to take card a and 33% to take cardb, so the probabilities become after the processing:

     probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]
    

    Now we process the current_set=1={a} the only possibility is to take b so we will end with set {a,b}. So we need to update its (3={a,b}) probability via our formula and we get:

     probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]
    

    In the next step we process 2, and given set {b} the only possibility is to pick card a, so probability of set {a,b} needs to be updated again

    probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]
    

    We can get to {a,b} on two different paths - this could be seen in our algorithm. The probability to go through set {a,b} at some point in our game is obviously 1.0.

    Another important thing: all paths that leads to {a,b} are taken care of before we process this set (it would be the next step).