Given multiple probability distributions, is there an optimized function in scipy (or other library) that allows me to add distributions? Consider the simple example:
Suppose I have a 6 sided die and a 20 sided die and I want to know the probability mass function for rolling a 2 through a 26.
from scipy.stats import randint
import pandas as pd
import plotly.express as px
x_points_6 = [x+1 for x in range(6)]
x_points_20 = [x+1 for x in range(20)]
dist_6_sided = [randint.pmf(k,1,7) for k in x_points_6]
dist_20_sided = [randint.pmf(k,1,21) for k in x_points_20]
total_dist = add_dist(x_points_6, dist_6_sided, x_points_20, dist_20_sided)
df = pd.DataFrame({'x':total_dist[0], 'y':total_dist[1]})
I have a function to somewhat brute force the addition of the distributions:
def add_dist(value1, prob1, value2, prob2):
result_value = []
result_prob = []
for prob1_i, value1_i in zip(prob1, value1):
for prob2_i, value2_i in zip(prob2, value2):
value = value1_i + value2_i
prob = prob1_i * prob2_i
result_value.append(value)
result_prob.append(prob)
unique_values = set(result_value)
count_x = []
for i in unique_values:
count_x.append(result_value.count(i))
return([result_value, result_prob])
And with this, I can get the resulting distribution:
df_added_pdfs = df.groupby(['x']).sum()
fig = px.bar(df_added_pdfs)
fig.show()
I am looking for a solution that can add any of scipy's built-in discrete or continuous distribution functions (not just the simple uniform case). I think I might be missing the correct search term. I would think this would be routine and there would be a function in scipy or numpy to do this. I am looking for a library with a more optimized function.
You can use a convolution of your PDFs:
from scipy.stats import randint
from scipy.signal import convolve
import pandas as pd
import plotly.express as px
x_points_6 = [x+1 for x in range(6)]
x_points_20 = [x+1 for x in range(20)]
dist_6_sided = [randint.pmf(k,1,7) for k in x_points_6]
dist_20_sided = [randint.pmf(k,1,21) for k in x_points_20]
conv = convolve(dist_6_sided, dist_20_sided)
df = pd.DataFrame({'x':range(2, 26+1), 'y':conv})
df_added_pdfs = df.groupby(['x']).sum()
fig = px.bar(df_added_pdfs)
fig.show()