This is a game where you have 12 cards and you pick you until you choose 3 from the same group. I am attempting to find the probability of choosing each group. The script that I have created works, but it is extremely slow. My coworker created a similar script in R without the functions and his script takes 1/100th the time that mine takes. I am just trying to figure out why. Any ideas would be greatly appreciated.
from collections import Counter
import pandas as pd
from datetime import datetime
weight = pd.read_excel('V01Weights.xlsx')
Weight looks like the following:
Symb Weight
Grand 170000
Grand 170000
Grand 105
Major 170000
Major 170000
Major 215
Minor 150000
Minor 150000
Minor 12000
Bonus 105000
Bonus 105000
Bonus 105000
Max Picks represents the total number of different "cards". Total Picks represents the max number of user choices. This is because after 8 choices, you are guaranteed to have 2 of each type so on the 9th pick, you are guaranteed to have 3 matching.
TotalPicks = 9
MaxPicks = 12
This should have been named PickedProbabilities.
Picks = {0:0,1:0,2:0,3:0}
This is my simple version of the timeit class because I don't like the timeit class
def Time_It(function):
start =datetime.now()
x = function()
finish = datetime.now()
TotalTime = finish - start
Minutes = int(TotalTime.seconds/60)
Seconds = TotalTime.seconds % 60
print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')
return(x)
Given x(my picks in order) I find the probability. These picks are done without replacement
def Get_Prob(x,weight):
prob = 1
weights = weight.iloc[:,1]
for index in x:
num = weights[index]
denom = sum(weights)
prob *= num/denom
weights.drop(index, inplace = True)
# print(weights)
return(prob)
This is used to determine if there are duplicates in my loop because that is not allowed
def Is_Allowed(x):
return(len(x) == len(set(x)))
This determines if a win is present in all of the cards present thus far.
def Is_Win(x):
global Picks
WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
IsWin = False
for index,item in enumerate(WinTypes):
# print(index)
if set(item).issubset(set(x)):
IsWin = True
Picks[index] += Get_Prob(x,weight)
# print(Picks[index])
print(sum(Picks.values()))
break
return(IsWin)
This is my main function that cycles through all of the cards. I attempted to do this using recursion but I eventually gave up. I can't use itertools to create all of the permutations because for example [0,1,2,3,4] will be created by itertools but this is not possible because once you get 3 matching, the game ends.
def Cycle():
for a in range(MaxPicks):
x = [a]
for b in range(MaxPicks):
x = [a,b]
if Is_Allowed(x):
for c in range(MaxPicks):
x = [a,b,c]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for d in range(MaxPicks):
x = [a,b,c,d]
if Is_Allowed(x):
if Is_Win(x):
# print(x)
continue
for e in range(MaxPicks):
x = [a,b,c,d,e]
if Is_Allowed(x):
if Is_Win(x):
continue
for f in range(MaxPicks):
x = [a,b,c,d,e,f]
if Is_Allowed(x):
if Is_Win(x):
continue
for g in range(MaxPicks):
x = [a,b,c,d,e,f,g]
if Is_Allowed(x):
if Is_Win(x):
continue
for h in range(MaxPicks):
x = [a,b,c,d,e,f,g,h]
if Is_Allowed(x):
if Is_Win(x):
continue
for i in range(MaxPicks):
if Is_Allowed(x):
if Is_Win(x):
continue
Calls the main function
x = Time_It(Cycle)
print(x)
writes the probabilities to a text file
with open('result.txt','w') as file:
# file.write(pickle.dumps(x))
for item in x:
file.write(str(item) + ',' + str(x[item]) + '\n')
Ok, this time I hope I got your problem right:)
There are two insights (I guess you have them, just for the sake of the completeness) needed in order to speed up your program algorithmically:
(card_1, card_2)
and (card_2, card_1)
are not equal, so we cannot use the results from the urn problem, and it looks like we need to try out all permutations.2^N
instead of N!
states).For a set of picked cards set
the probability to pick a card i
in the next turn is:
norm:=sum Wi for i in set
P(i|set)=Wi/norm if i not in set else 0.0
The recursion for calculating P(set)
- the probability that a set of picked card occured during the game is:
set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set
However this should be done only for set_without_i
for which the game not ended yet, i.e. no group has 3 cards picked.
This can be done by means of recursion+memoization or, as my version does, by using bottom-up dynamic programming. It also uses binary representation of integers for representations of sets and (most important part!) returns the result almost instantly [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)]
:
#calculates probability to end the game with 3 cards of a type
N=12
#set representation int->list
def decode_set(encoded):
decoded=[False]*N
for i in xrange(N):
if encoded&(1<<i):
decoded[i]=True
return decoded
weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]
def get_probs(decoded_set):
denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]
def end_group(encoded_set):
for i in xrange(4):
whole_group = 7<<(3*i) #7=..000111, 56=00111000 and so on
if (encoded_set & whole_group)==whole_group:
return i
return None
#MAIN: dynamic program:
MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX
#we always start with the empty set:
probs[0]=1.0
#building bottom-up
for current_set in xrange(MAX):
if end_group(current_set) is None: #game not ended yet!
decoded_set=decode_set(current_set)
trans_probs=get_probs(decoded_set)
for i, is_set in enumerate(decoded_set):
if not is_set:
new_set=current_set | (1<<i)
probs[new_set]+=probs[current_set]*trans_probs[i]
#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
group_won=end_group(current_set)
if group_won is not None:
group_probs[group_won]+=probs[current_set]
print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)
Some explanation of the "tricks" used in code:
A pretty standard trick is to use integer's binary representation to encode a set. Let's say we have objects [a,b,c]
, so we could represent the set {b,c}
as 110
, which would mean a
(first in the list corresponds to 0
- the lowest digit) - not in the set, b(1)
in the set, c(1)
in the set. However, 110
read as integer it is 6
.
The current_set - for loop simulates the game and best understood while playing. Let's play with two cards [a,b]
with weights [2,1]
.
We start the game with an empty set, 0
as integer, so the probability vector (given set, its binary representation and as integer mapped onto probability):
probs=[{}=00=0->1.0, 01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]
We process the current_set=0
, there are two possibilities 66% to take card a
and 33% to take cardb
, so the probabilities become after the processing:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]
Now we process the current_set=1={a}
the only possibility is to take b
so we will end with set {a,b}
. So we need to update its (3={a,b}
) probability via our formula and we get:
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]
In the next step we process 2
, and given set {b} the only possibility is to pick card a
, so probability of set {a,b}
needs to be updated again
probs=[{}=00=0->1.0, 01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]
We can get to {a,b}
on two different paths - this could be seen in our algorithm. The probability to go through set {a,b}
at some point in our game is obviously 1.0
.
Another important thing: all paths that leads to {a,b}
are taken care of before we process this set (it would be the next step).