Long story short, I’m rolling unfair dice and trying to find the probability of any given sum after x number of rolls. In practice there are hundreds of faces and hundreds of rolls, so I can't simply find all possible combinations, then find the probability, do to list size and memory restrictions.
I’m working around this problem by finding all the possible combinations and their probability of occurring 1 roll at a time. After I find all the possible combinations and the probability of each combo I need to condense it by combining the probabilities of combinations with the same sum.
Think of the combos (1,3) and (2,2) both are unique combos, but they both have a sum of 4 and if the probability of (1,3) occurring is 3/6 and the probability of (2,2) occurring is 1/6 the probability of combos with the sum of 4 is 4/6.
Hopefully I don’t need to go into the details about why this works and the math behind it, but if more information is needed, please let me know.
I’ve got it to the point that I have a list containing the sum of one of each possible combinations after a roll as well as a congruent list containing the probabilities of these sums.
For our example it’s something like this:
Sum of this rolls combos:
Sums = [6,4,2,5,4,3]
The congruent list of probabilities:
Probs = [11/36, 9/36, 7/36, 5/36, 3/36, 1/36]
In our real example, the numbers don’t and can't have a common denominator and are long decimals, but I'm using these numbers to show what I’m looking for.
Order is important. I can remove the dupe sums from the sums list and maintain the order I need by using:
Condenced_sums = []
For x in sums:
If x not in condenced_sums
condenced_sums.append(x)
this gives us Condensed_sums = [6, 4, 2, 5, 3]
My problem is that the corresponding probability doesn’t just go away, and the probability for the first occurrence and any duplicate occurrences are not the same. To make sure the probabilities of each sum stays accurate, before I remove the duplicate occurrences’ probability, I need to add it to the original occurrence’s probability. Ideally, the condensed prob will look like this:
Condenced_prob = [11/36, 12/36, 7/36, 5/36, 1/36]
I was thinking that maybe I could zip the lists together and get something like:
Zipped_Sum_Prob = [(6, 11/36), (4, 9/36), (2, 7/36), (5, 5/36), (4, 3/36), (3, 1/36)]
Then maybe use an If else statement inside a list comprehension statement, but I’m still really new to this and haven't been able to think of a way to make this work.
As another note because I'm not sure if it will make a difference, any sum might occur several times, not just twice.
Any help is appreciated, even if it’s simply a better way to ask the question.
Thanks!
edit:
I thought it might be more useful to give a little more background as to why I need to maintain the order. Here is what I have so far:
assuming faces = [1,2,3] #in reality this this is a random length list with random values new_faces = faces rolls = random number prob = [.5,.333333, .1666666] #this is a congruent list of random probabilities. new_prob = prob
#en stands for enumerated.
en_new_faces = [(index_2, y_face) for index_2, y_face in enumerate(new_faces)]
en_faces = [(index_1, x_face) for index_1, x_face in enumerate(faces)]
#Why new faces? For the first roll, the possible sums are the same as the value on the faces of the die, but on the second roll we start to have combos and a number of unique sums/possible outcomes is greater than the number of faces on the original die. On the 3ed roll we need to treat each unique sum like a face of one of our dice because each unique sum can be increased by any of the values on the faces of our original die, but we don’t need to apply this to each unique combo because adding 3 to combo (2 + 2) and adding 3 to (1 + 3) gives us the same outcome. So:
new_faces = list((w,x,y,z) for w, x in en_faces for y, z in en_new_faces if w <= y)
#rolls > 1 because the first roll is the same as faces. While rolls > 1
#new_list find the sum of combos that happen twice such as (1,3) & (3,1)
new_list = [(x+z) for w,x,y,z in new_faces if w != y]
#even_list finds the sum of combos that happen once such as (1,1) or (2,2). Even though the combos only happen once, the sum may not be unique.
even_list = [(x+z) for w,x,y,z in new_faces if w == y]
these lists are separated because the probability of even list combos are (1/(len(faces)**rolls)) and the probability of new_list combos are (2/(len(faces)**rolls))
By keeping the lists ordered, we can keep it so that the probability of the first len(faces) of numbers happen once and the probability of the remaining numbers in new_faces happen twice no matter how many rolls we go through.
Here is how i’ve been keeping it in order:
for (x) in even_list:
new_list.append(x)
new_list = list(reversed(new_list))
rolls = rolls - 1
And that leads us to my question.
You should use a mapping data structure to collect the sums:
from collections import OrderedDict
d = OrderedDict()
for s, p in zip(Sums, Probs):
d[s] = d.get(s, 0) + p
Sums = [*d.keys()]
Probs = [*d.values()]