Search code examples
pythontime-seriessignal-processingaccelerometersimilarity

Similarity between two time series


I have two files with accelerator readings and I want to get some metric/ measurement to get the similarity between these two files. I have tried Pearson’s R coefficient, dtw distance, dtw score. Pearson’s r gives returns a value 1 if the files are identical, the dtw score and path are 0 if the files are identical.

But I need a solution if the files are as the ones in the figures, similar, with a little time lag. They are readings from two different accelerators who were attached to the same source. The sampling frequency and amplitude is not same. Even the number of readings are not same. Time stamps could be different.

How do I measure the similarity between such files? Is there some metric or measurement I can get using Python? Because dtw score and dtw distance do give some output, but there is no way I can say the files are similar using those values.

File 1

File 2


Solution

  • You can read them using pandas and get correlations after successive lags e.g. if you have them as follows:

    import pandas as pd
    df = pd.DataFrame({'val1': range(10),
                       'val2': [0]*5 + list(range(5))})
    df
      val1 val2
    0   0   0
    1   1   0
    2   2   0
    3   3   0
    4   4   0
    5   5   0
    6   6   1
    7   7   2
    8   8   3
    9   9   4
    

    You can do

    max([df.val1.corr(df.val2.shift(-delay)) for delay in range(1, len(df))])
    >>> 1.0
    

    This will always result in 1.0 incase both are the same with some delays as it is getting correlation after each incorporating each delay. You can be creativewith the delay range as per your data to avoid looping through all of it or modify into a for loop with conditional break to stop right where you find out its actually sits at 1.0.