I am trying to use the DTW algorithm from the Similarity Measures library. However, I get hit with an error that states a 2-Dimensional Array is required. I am not sure I understand how to properly format the data, and the documentation is leaving me scratching my head.
https://github.com/cjekel/similarity_measures/blob/master/docs/similaritymeasures.html
According to the documentation the function takes two arguments (exp_data and num_data ) for the data set, which makes sense. What doesn't make sense to me is:
exp_data : array_like
Curve from your experimental data. exp_data is of (M, N) shape, where M is the number of data points, and N is the number of dimensions
This is the same for both the exp_data and num_data arguments.
So, for further clarification, let's say I am implementing the fastdtw library. It looks like this:
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean
x = np.array([1, 2, 3, 3, 7])
y = np.array([1, 2, 2, 2, 2, 2, 2, 4])
distance, path = fastdtw(x, y, dist=euclidean)
print(distance)
print(path)
Or I can implement the same code with dtaidistance:
from dtaidistance import dtw
x = [1, 2, 3, 3, 7]
y = [1, 2, 2, 2, 2, 2, 2, 4]
distance = dtw.distance(x, y)
print(distance)
However, using this same code with Similarity Measures results in an error. For example:
import similaritymeasures
import numpy as np
x = np.array([1, 2, 3, 3, 7])
y = np.array([1, 2, 2, 2, 2, 2, 2, 4])
dtw, d = similaritymeasures.dtw(x, y)
print(dtw)
print(d)
So, my question is why is a 2-Dimensional Array required here? What is similarity measures doing that the other libraries are not?
And if Similarity measures requires data of (M, N) shape, where M is the number of data points, and N is the number of dimensions, then where does my data go? Or, phrased differently, M is the number of data points, so in the above examples x has 5 data points. And N is the number of dimensions, and in the above examples x has one dimension. So am I passing it [5, 1]? This doesn't seem right for obvious reasons, but I can't find any sample code that makes this any clearer.
My reason for wanting to use similaritymeasures is that it has multiple other functions that I would like to leverage, such as Fretchet Distance and Hausdorff distance. I'd really like to understand how to utilize it.
I really appreciate any help.
It appears the solution in my case was to include the index in the array. For example, if your data looks like this:
x = [1, 2, 3, 3, 7]
y = [1, 2, 2, 2, 2, 2, 2, 4]
It needs to look like this:
x = [[1, 1], [2, 2], [3, 3], [4, 3], [5, 7]]
y = [[1, 1], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2], [7, 2], [8, 4]]
In my case, x and y were two separate columns in a pandas dataframe. My solution was as follows:
df['index'] = df.index
x1 = df['index']
y1 = df['column1']
P = np.array([x1, y1]).T
x2 = df['index']
y2 = df['column2']
Q = np.array([x2, y2]).T
dtw, d = similaritymeasures.dtw(P, Q)
print(dtw)