Search code examples
pythonarrayscsvdistancemahalanobis

Calculate distance between 1D array and nD array using python


I'm a beginner in python i wish you can help me to fix my problem.

I have tow file library.csv (9 columns) and cases.csv (8 columns) i read them with np.loadtxt. I select columns from library to put them into array base[], except the last column and I put the cases.csv into an array problems[]. I would to calculate mahalanobis distance between each row in the problems array with all the rows of base [] array and store the min distance in a table.

This is my code:

# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn

from keras.models import load_model
from scipy.spatial import distance

# [1] Get the  library.csv and  cases.scv

library = np.loadtxt("library.csv", delimiter=",")
cases = np.loadtxt("cases.csv", delimiter=",")
problems = np.loadtxt("cases.csv", delimiter=",") #cases copie

# Select columns from library to use as base cases, except solutions
base = library[:, range(library.shape[1] - 1)] # Exclude last column (solution)

 # Move through all problem cases
for i in range(problems.shape[0]):
        
        # [3.1] Get inverse covariance matrix for the base cases

  covariance_matrix = np.cov(base)                                    # Covariance
  inverse_covariance_matrix = np.linalg.pinv(covariance_matrix)       # Inverse

        # [3.2] Get case row to evaluate
  case_row = problems[i, :]
  

        # Empty distances array to store mahalanobis distances obtained comparing each library cases
  distances = np.zeros(base.shape[0])

        # [3.3] For each base cases rows
  for j in range(base.shape[0]):
    # Get base case row
       base_row = base[j, :]

    # [3.4] Calculate mahalanobis distance between case row and base cases, and store it
       distances[j] = distance.mahalanobis(case_row, base_row, inverse_covariance_matrix)

# [3.5] Returns the index (row) of the minimum value in distances calculated
  min_distance_row = np.argmin(distances)

But I get this error:

Using TensorFlow backend.

Traceback (most recent call last):
File "C:\Users\HP\Desktop\MyAlgo\mainAlgo.py", line 45, in
distances[j] = distance.mahalanobis(case_row, base_row, inverse_covariance_matrix)
File "C:\Users\HP\AppData\Local\Programs\Python\Python38\lib\site-packages\scipy\spatial\distance.py", line 1083, in mahalanobis
m = np.dot(np.dot(delta, VI), delta)
File "<array_function internals>", line 5, in dot
ValueError: shapes (8,) and (384,384) not aligned: 8 (dim 0) != 384 (dim 0)


Solution

  • Your problem seems to be that base_row and case_row is of length 8 while the covariance_matrix contains 384 variables, those numbers should be the same. Because of this the matrix multiplication can't be done.

    I lack knowledge of your data and the statistical properties here, but my guess is that you need to transpose base before calculating the covariance matrix. In the call np.cov(base) a row in base should contain all observations for a single variable. https://numpy.org/devdocs/reference/generated/numpy.cov.html