Search code examples
rartificial-intelligenceknnmahalanobis

Problems using the Mahalanobis distance in the KNN algorithm


I'm a student and I'm trying to do this homework, where I need to do the KNN algorith with the Mahalanobis distance as parameter, but for some reason that I can't figure out, my code is not working.

I'm not a R master, actually I know only the basics.

library(FNN)
library(readr)
library(pvclass)
library(corpcor)
iono <- read_csv("C:/Users/bruno/Dropbox/Eng. Computação/17.2/IA/Prática Ionosphere/ionosphere.data.txt", 
             col_names = FALSE)
p <- 0.8
iono <- as.matrix(iono)
#gerar indices para selecionar linhas da matriz
train_idx <- sample(x = nrow(iono), size = p*nrow(iono), replace = FALSE)
test_idx <- c(1:nrow(iono))[-train_idx]

#gerar as matrizes com os dados de treinamento/teste
iono_train_x <- iono[train_idx, 1:34]
iono_train_y <- iono[train_idx, 35]
iono_test_x <- iono[test_idx, 1:34]
iono_test_y <- iono[test_idx, 35]

# -------- Implementando a Funcao de Distancia Mahalanobis para KNN
# x - matriz de dados treinamento
# xtest - matriz de dados de teste
# cx - matriz de parametros de mahalanobis
mahalanobis_xy_knn <- function(xtest, x, cx) {

Mdist <- matrix(0, nrow = nrow(xtest), ncol = nrow(x))
for(i in 1:nrow(xtest)){
Mdist[i,] <- mahalanobis(x = x, center = xtest[i, ], cov = cx, inverted = TRUE)
}

return(Mdist)
}


# --------------- Algoritmo KN com Métrica da Mahalanobis
knn_custom <- function(Xtrain, Xtest, Ytrain, k, M){

# ------ Obtem as Distancias Dist(MxN) de todos Xtest(Mxd) para todos Xtrain(Nxd)
# ------ Usando a Métrica Aprendida
Dist <- mahalanobis_xy_knn(Xtest,Xtrain, M)
#   Dist <- dist()
#dados <- data.frame(Dist, Ytrain)

Yhat <- matrix(0, nrow = nrow(Xtest), ncol = 1) 

label_um <- 0
label_dois <- 0
# ---- Calcula o Label de Cada Xtest
for(i in 1:length(Yhat)){
# Agrupa Dist e Y num data frame
dados <- data.frame(Dist[i,], Ytrain)
# Ordena Data frame segundo a Distancia
ind <- order(dados$Dist.i...)
# Toma os K Labels mais Próximos
k_labels_proximos <- Y[ind[1:k]]
# Verifica a maioria
for(j in 1:k){
  if (k_labels_proximos[j] == 1) label_um <- label_um + 1
  else label_dois <- label_dois + 1
}

if(label_um > label_dois) Yhat[i] <- 1
else if(label_um < label_dois) Yhat[i] <- 2

label_um <- 0
label_dois <- 0
}

return(Yhat)
}


# ------------- Aprendizado da Metrica de Mahalanobis
#dados <- data.frame(iono_test_x,iono_test_y)
M_cov <- cov(iono)
inv_m_cov <- pseudoinverse(iono)
M_ident <- diag(ncol(iono))

# ------ Aplicar o K-NN com a Métrica Mahalanobis com matriz de Covariancia
saida_knn_maha <- knn_custom(train_idx, iono_test_x, iono_test_y, k, M_cov)
acc_knn_maha <- sum(iono_test_y == saida_knn_maha)/length(iono_test_y) * 100

When I try to run the code, this is the error I get:

Error: is.numeric(x) || is.logical(x) is not TRUE

RStudio doesn't show me where the error is, so I can't fix it. The problem is in the comparisons?


Solution

  • It would be good if you could share the structure and some of the data observations of your data. I would assume your dataset "Ionosphere" is the same as this one: https://www.rdocumentation.org/packages/mlbench/versions/2.1-1/topics/Ionosphere

    If the error is from this line:

    M_cov <- cov(iono)
    

    Check if the dataset of iono here (when executing the covariance function) contains non-numeric variables, e.g. factor / str / missing values. You can check the structure using the below function:

    str(iono)
    

    All non-numeric variables should be excluded to avoid such problem when executing the cov function.