Search code examples
rfor-loopplotclassificationsupervised-learning

Supervised classification: plotting K-NN accuracy for different sample sizes and k values


Hope you guys understand that it is hard to replicate something like this on a generic dataset.

Basically what I'm trying to do is perform K-NN with test and train sets of two different sizes for seven different values of k.

My main problem is that res should be a vector storing all the accuracy values for the same train-set size but it shows one value per iteration and this doesn't allow me to plot accuracy graphs as they appear empty.

Do you know how to fix this problem?

Data is available directly on R for free.

data("Sonar")

#Randomization of the sample
set.seed(123)

random <- sample(rep(1:dim(Sonar)[1]))

Sonar <- Sonar[random,]
head(Sonar)


for (i in c(50,100)){   #train/test set size
  sonar.train <- Sonar[1:i,-61]
  sonar.train.label <- Sonar[1:i,61]
  sonar.test <- Sonar[(1+i) :208,-61]
  sonar.test.label <- Sonar[(1+i) :208 ,61]
  res <- rep(NA,7)
  for (j in c(3,5,7,9,11,13,15)){     #values of k
    mod = knn(train= sonar.train, test = sonar.test, cl = sonar.train.label, k = j) #classification for test set
    err = sum(sonar.test.label==mod) #accuracy
    res[match(j,c(3,5,7,9,11,13,15))] = err/length(mod)  #put accuracy value in vector
    print(res)
    plot(x = c(3,5,7,9,11,13,15) ,y = res, type = "l" ,col = "blue", xlab = "Neighbours", ylab = "Accuracy") #plot the accuracy graphs for each of the two different train/test sets
    res <- rep(NA,7)
  }
  }
#output
> 
 0.6835443        NA        NA        NA        NA        NA        NA
        NA 0.6582278        NA        NA        NA        NA        NA
        NA        NA 0.6075949        NA        NA        NA        NA
        NA        NA        NA 0.6265823        NA        NA        NA
        NA        NA        NA        NA 0.5949367        NA        NA
        NA        NA        NA        NA        NA 0.5949367        NA
        NA        NA        NA        NA        NA        NA 0.5506329
 0.6759259        NA        NA        NA        NA        NA        NA
        NA 0.6111111        NA        NA        NA        NA        NA
        NA        NA 0.5648148        NA        NA        NA        NA
        NA        NA        NA 0.5833333        NA        NA        NA
        NA        NA        NA        NA 0.5925926        NA        NA
        NA        NA        NA        NA        NA 0.5740741        NA
        NA        NA        NA        NA        NA        NA 0.5740741

The accuracy plot appear empty and with different labels for k on the x axis.

Thank you for reading and helping me!


Solution

  • Your inner loop is supposed to fill the values in res, one per iteration. However, you seem to reset res at the end of each iteration of the loop. That's why it is not keeping any of the previous values.

    These two lines need to be outside the inner loop (and inside the outer loop)

      plot(x = c(3,5,7,9,11,13,15) ,y = res, type = "l" ,col = "blue", xlab = "Neighbours", ylab = "Accuracy") #plot the accuracy graphs for each of the two different train/test sets
      res <- rep(NA,7)