Search code examples
pythonpython-3.xlevenshtein-distancedifflib

Similarity between lists of floats


I have a list of floats that I want to compare to other lists and get the similarity ratio in python :

The list that I want to compare:

[0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001]

One of the other lists:

[0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000]

I tried converting them to strings and using fuzzywyzzy library, python-Levenshtein and difflib to compare the strings and get a ratio, but this does not give me the results that I want and they are very slow. I searched and can't find anything about this.

What is the best way to compare 2 lists of floats ?

I am asking to know whether there is a native way to compare float lists for similarity or a library that does the job, like the many examples of string comparison.


Solution

  • The question is no exactly clear in my oppinion, nevertheless you could see if the following approach helps you:

    import numpy as np
    l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
    l2 = np.array([0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])
    
    mse1 = ((l1 - l2)**2).mean()
    # Out[180]: 6.699999999999999e-08
    
    l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
    l2 = np.array([1.0000,1.0002,1.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])
    
    mse2 = ((l1 - l2)**2).mean()
    # Out[180]: 0.15000006700000001
    
    mse1 < mse2
    # Out[187]: True
    

    You won't get a value between 0 and 1 but you can compare the results, and more identical they are the more they approach 0. mse stands for mean squared error. But there are a lot more metrics which could be relevant to you, like msle, mae, etc.