I'm studying GANs (and I'm a beginner in python) and I found this part of the code in the previous exercises that I don't understand. Concretely I don't understand why is used the boolean of the 9th line (Xk = X[Y == k]) for the reasons that I write down below
class BayesClassifier:
def fit(self, X, Y):
# assume classes are numbered 0...K-1
self.K = len(set(Y))
self.gaussians = []
self.p_y = np.zeros(self.K)
for k in range(self.K):
Xk = X[Y == k]
self.p_y[k] = len(Xk)
mean = Xk.mean(axis=0)
cov = np.cov(Xk.T)
g = {'m': mean, 'c': cov}
self.gaussians.append(g)
# normalize p(y)
self.p_y /= self.p_y.sum()
I feel that I'm not understanding something very basic.
You should take into account that X, Y, k
are NumPy arrays, not scalars, and some operators are overloaded for them. Particularly, ==
and Boolean-based indexing. ==
will be element-wise comparison, not the whole array comparison.
See how it works:
In [9]: Y = np.array([0,1,2])
In [10]: k = np.array([0,1,3])
In [11]: Y==k
Out[11]: array([ True, True, False])
So, the result of ==
is a Boolean array.
In [12]: X=np.array([0,2,4])
In [13]: X[Y==k]
Out[13]: array([0, 2])
The result is an array with elements selected from X
when the condition is True
Hence len(Xk)
will be the number of matched elements between X
and k
.