Search code examples
sasknn

How do you handle deadlocks using Proc Discrim in SAS for KNN?


I have a proc discrim statement which runs a KNN analysis. When I set k = 1 then it assigns everything a category (as expected). But when k > 1 it leaves some observations unassigned (sets category as Other).

I'm assuming this is a result of deadlock votes for two or more of the categories. I know there are ways around this by either taking a random one of the deadlocked votes as the answer, or taking the nearest of the deadlocked votes as the answer.

Is this functionality available in proc discrim? How do you tell it how to deal with deadlocks?

Cheers!


Solution

  • Your assumption that the assignment of an observation to the "Other" class results from the same probability of assignment to two or more of the designated classes is correct when the number of nearest neighbors is two or more. You can see this by specifying the PROC DISCRIM statement option, OUT=SASdsn, to write a SAS output data set of how well the procedure classified the input observations. This output data set contains probabilities for assignment to each of the designated classes. For example, using two nearest neighbors (K=2) with the iris data set yields five observations that the procedure classifies as ambiguous, with a probability of 0.50 for being assigned to either the Versicolor or the Virginica class. From the output data set, you can select these ambiguously classified observations and assign them randomly to these classes in a subsequent DATA step. Or, you can compare the values of the variables used to classify these ambiguously classified observations to the means of these values for each of the classes, perhaps by calculating a squared distance +/- standardized by the standard deviation of each value and by assigning the observation to the "closest" class.