Hope you guys understand that it is hard to replicate something like this on a generic dataset.
Basically what I'm trying to do is perform K-NN with test and train sets of two different sizes for seven different values of k.
My main problem is that res should be a vector storing all the accuracy values for the same train-set size but it shows one value per iteration and this doesn't allow me to plot accuracy graphs as they appear empty.
Do you know how to fix this problem?
Data is available directly on R for free.
data("Sonar")
#Randomization of the sample
set.seed(123)
random <- sample(rep(1:dim(Sonar)[1]))
Sonar <- Sonar[random,]
head(Sonar)
for (i in c(50,100)){ #train/test set size
sonar.train <- Sonar[1:i,-61]
sonar.train.label <- Sonar[1:i,61]
sonar.test <- Sonar[(1+i) :208,-61]
sonar.test.label <- Sonar[(1+i) :208 ,61]
res <- rep(NA,7)
for (j in c(3,5,7,9,11,13,15)){ #values of k
mod = knn(train= sonar.train, test = sonar.test, cl = sonar.train.label, k = j) #classification for test set
err = sum(sonar.test.label==mod) #accuracy
res[match(j,c(3,5,7,9,11,13,15))] = err/length(mod) #put accuracy value in vector
print(res)
plot(x = c(3,5,7,9,11,13,15) ,y = res, type = "l" ,col = "blue", xlab = "Neighbours", ylab = "Accuracy") #plot the accuracy graphs for each of the two different train/test sets
res <- rep(NA,7)
}
}
#output
>
0.6835443 NA NA NA NA NA NA
NA 0.6582278 NA NA NA NA NA
NA NA 0.6075949 NA NA NA NA
NA NA NA 0.6265823 NA NA NA
NA NA NA NA 0.5949367 NA NA
NA NA NA NA NA 0.5949367 NA
NA NA NA NA NA NA 0.5506329
0.6759259 NA NA NA NA NA NA
NA 0.6111111 NA NA NA NA NA
NA NA 0.5648148 NA NA NA NA
NA NA NA 0.5833333 NA NA NA
NA NA NA NA 0.5925926 NA NA
NA NA NA NA NA 0.5740741 NA
NA NA NA NA NA NA 0.5740741
The accuracy plot appear empty and with different labels for k on the x axis.
Thank you for reading and helping me!
Your inner loop is supposed to fill the values in res
, one per iteration. However, you seem to reset res
at the end of each iteration of the loop. That's why it is not keeping any of the previous values.
These two lines need to be outside the inner loop (and inside the outer loop)
plot(x = c(3,5,7,9,11,13,15) ,y = res, type = "l" ,col = "blue", xlab = "Neighbours", ylab = "Accuracy") #plot the accuracy graphs for each of the two different train/test sets
res <- rep(NA,7)