I would like to create a Gaussian Kernel Density Estimate of two samples of fantasy team scores over the first six weeks of the NFL season. To do this, I created a list of two different KernelDensity objects and plotted the log probability of each score form 0 to 300 according to each KDE function. At this point I have different log probabilities for each score. For some reason, however, when I exponentiate each log probability, I suddenly have equal values for each KDE.
A successful answer clearly identifies what mistake has been made that somehow passes each score to the second KDE and then provides a solution.
# Import Modules
import math
from sklearn.neighbors import KernelDensity
import numpy as np
X = np.array([[132,151,109,71,104,100],[123,182,102,123,108,82]]).transpose()
# Create a list to put two KernelDensity objects in
kde = [[],[]]
for i in range(2):
kde[i] = KernelDensity(kernel='gaussian', bandwidth=5).fit(X[:,i].reshape(-1,1))
# Create a list to place the log probabilities
log_prob = [[],[]]
for i in range(2):
X = np.arange(0,300,1)
log_prob[i] = kde[i].score_samples(X.reshape(-1,1)).reshape(-1,1)
# (A mistake has been made in this section) Create a list for the probability of each score according to the two different KDEs
prob = [[0]*300]*2
for i in range(2):
for j in range(300):
prob[i][j] = math.exp(log_prob[i][j])
Ya it has to do with how you are constructing the lists. I can't give you a very technically answer as to why this happens, but I've had the same probably in the past in that when when trying to construct lists by index positions, any changes made to a list, also affected, overwrote previous elements in a list.
The way around this is to use a .copy(), or in what I did here, was create separate lists that then are appended, as opposed to using index positions to set a value.
# Import Modules
import math
from sklearn.neighbors import KernelDensity
import numpy as np
X = np.array([[132,151,109,71,104,100],[123,182,102,123,108,82]]).transpose()
# Create a list to put two KernelDensity objects in
kde = [[],[]]
for i in range(2):
kde[i] = KernelDensity(kernel='gaussian', bandwidth=5).fit(X[:,i].reshape(-1,1))
# Create a list to place the log probabilities
log_prob = [[],[]]
for i in range(2):
X = np.arange(0,300,1)
log_prob[i] = kde[i].score_samples(X.reshape(-1,1)).reshape(-1,1)
# (A mistake has been made in this section) Create a list for the probability of each score according to the two different KDEs
prob = []
for i in range(2):
prob_list_alpha = []
for j in range(300):
prob_list_alpha.append(math.exp(log_prob[i][j]))
prob.append(prob_list_alpha)
Output: