machine-learning classification gaussian bayesian kernel-density

Bayes Classification with Multivariate Parzen Window using Spherical Kernel

I'm having a problem implementing a Bayes Classifier with the Parzen window algorithm using a spherical (or isotropic) kernel.

I am running the algorithm with test data containing 2 dimensions and 3 different classes (For each class, I have 10 test points, and 40 training points, all in 2 dimensions). When I change the value of my hyper-parameter (sigma_sq for the spherical Gaussian kernel), I find that there is no effect on how the points are classified.

This is my density estimator. My self.sigma_sq is the same across all the dimensions of my data (2 dimensions)

 for i in range(test_data.shape[0]):
     log_prob_intermediate = 0 
     for j in range(n): #n is size of training set
         c = -self.n_dims * np.log(2*np.pi)/2.0 - self.n_dims*np.log(self.sigma_sq)/2.0
         log_prob_intermediate += (c - np.sum((test_data[i,:] -  self.train_data[j,:])**2.0) / (2.0 * self.sigma_sq))
     log_prob.append(log_prob_intermediate / n)

How I implemented my Bayes Classifier: There are 3 classes that my Bayes Classifier must distinguish. I created 3 training sets and 3 test sets (one training and test set per class). For each point in my test set, I run the density estimator for each class on the point. This gives me a vector of 3 values: the log probability that my new point is in class1, class2, or class3. I then choose the maximum value and assign the new point to that class.

Since I am using a spherical Gaussian kernel, I am of the understanding that my sigma_sq must be common for each density estimator (one density estimator for each class). Is this correct? If I had a different sigma_sq for each dimension pair, wouldn't this give me somewhat of a diagonal Gaussian kernel?
For my list of 30 test points (10 for each class), I find that running the bayes classifier on these points continues to give me the exact same classification for each point, regardless of what sigma I use. Is this normal? Since it's a spherical Gaussian kernel, and all my dimensions use the same kernel, is increasing or decreasing my sigma_sq just having a proportional effect on my log probability with no change in the classification? OR do I have some sort of problem with my density estimator that I can't figure out.

Solution

Lets address each thing separately

Using the same sigma for each dimension makes your kernel radial, this is true; however, you can (and should!) use different sigma for each class, as each distribution usually requires different density estimator, for simple heuristics read for example about Scott's rule of thumb for the kernel width selection in gaussian case or later work by Silverman.
It is hard to tell whether in your particular case choice of sigma should change the classification - in general it should be true; but each dataset has its own properties. However, your data is just 2D, which makes it perfect for visualization. Draw your data, then, draw each KDE and simply visually investigate what is going on.