Search code examples
rmahalanobis

Custom Mahalanobis Distance implementation not returning the correct distance


I'm developing an R function that calculates the Mahalanobis Distance. Yeah, I know the mahalanobis() function does it already but I need to implement it "by hand". The current implementation I have is this (it gives different values compared to the R function):

#'mahalanobis_distance
#'
#'Calculates the mahalanobis_distance given the input data
#'
#'@param point Point to calculate the mahalanobis_distance
#'@param sample_mean Sample mean
#'@param sample_covariance_matrix Sample Covariance Matrix
#'
#'@return Mahalanobis distance associated to the point
#'
#'@examples
#'
#'@export
mahalanobis_distance <- function(value, sample_mean, sample_covariance_matrix) {
  # Ensure value and sample_mean are vectors
  value <- as.matrix(value)
  sample_mean <- as.matrix(sample_mean)

  if (ncol(value) != ncol(sample_mean)) {
    stop("Dimensions of value and sample_mean do not match")
  }

  # Calculate the difference vector
  diff_vector <- value - sample_mean

  # Calculate the inverse of the covariance matrix
  inverted_covariance_matrix <- solve(sample_covariance_matrix)

  # Calculate the Mahalanobis distance
  mahalanobis_dist <- sqrt(t(diff_vector) %*% inverted_covariance_matrix %*% diff_vector)

  return(as.numeric(mahalanobis_dist))
}

I've been trying it out with this part of my code:

mahalanobis_mat_vector = c();
for(i in 1:nrow(inputData)){
    mahalanobis_mat_vector = c(mahalanobis_mat_vector, mahalanobis_distance(inputData[i,],    sampleMeans, covariance_matrix));
}
distances = mahalanobis_mat_vector;
message("My mahalanobis distance vector: ");
print(distances);
distances = mahalanobis(inputData, sampleMeans, covariance_matrix);
message("R own function mahalanobis(): ");
print(distances);

Which gives me different results for my own function and the R implemented mahalanobis distance function...

My mahalanobis distance vector: 
[1] 1.5243357 2.1412611 0.6774662 0.3867442 0.2811000 0.1497338 2.0931872
R own function mahalanobis(): 
[1] 2.32359939 4.58499903 0.45896039 0.14957109 0.07901719 0.02242021 4.38143269

The inputData can be obtained with this code:

inputData = t(matrix(c(3,2,3.5,12,4.7,4.1,5.2,4.9,7.1,6.1,6.2,5.2,14,5.3),2,7,dimnames=list(c("r","d"))));
inputData = data.frame(inputData);
if (is.data.frame(inputData)) {
    inputData = as.matrix(inputData);
} else {
    stop("inputData must be a dataframe")
}

Maybe I've implemented something wrong in my function.


Solution

  • Your function is correct. Look at the documentation for mahalanobis() (in R studio, enter ?mahalanobis in the console):

    Returns the squared Mahalanobis distance of all rows in x and the vector μ = center with respect to Σ = cov.

    The R function calculates the squared Mahalanobis distance D2, whereas your function calculates the Mahalanobis distance D. Taking the square root of the values given by mahalanobis() gives:

    [1] 1.5243357 2.1412611 0.6774662 0.3867442 0.2811000 0.1497338 2.0931872
    

    Equivalent to the output of your function.