I've two binary lists that I'm attempting to compare. To compare I sum where each corresponding value is equal and transform this to a percentage :
import numpy as np
l1 = [1,0,1]
l2 = [1,1,1]
print(np.dot(l1 , l2) / len(l1) * 100)
prints 66.666
So in this case l1 and l2 are 61.666 in terms of closeness. As each list is less similar the closeness value decreases.
For example using values :
l1 = [1,0,1]
l2 = [0,1,0]
returns 0.0
How to plot l1
and l2
that describe the relationship between l1
and l2
? Is there a name for using this method to measure similarity between binary values ?
Using a scatter :
import matplotlib.pyplot as plt
plt.scatter( 'x', 'y', data=pd.DataFrame({'x': l1, 'y': l2 }))
produces :
But this does not make sense ?
Update :
"if both entries are 0, this will not contribute to your "similarity"
Using updated code below in order to compute similarity, this updated similarity measure includes corresponding 0 values in computing final score.
import numpy as np
l1 = [0,0,0]
l2 = [0,1,0]
print(len([a for a in np.isclose(l1 , l2) if(a)]) / len(l1) * 100)
which returns :
66.66666666666666
Alternatively, using below code with measure normalized_mutual_info_score
returns 1.0 for lists that are the same or different, therefore normalized_mutual_info_score
is not a suitable similarity measure ?
from sklearn.metrics.cluster import normalized_mutual_info_score
l1 = [1,0,1]
l2 = [0,1,0]
print(normalized_mutual_info_score(l1 , l2))
l1 = [0,0,0]
l2 = [0,0,0]
print(normalized_mutual_info_score(l1 , l2))
prints :
1.0
1.0
No, the plot does not make sense. What you are doing is essentially an inner product between vectors. According to this metric l1
and l2
are supposed to be vectors in a 3D (in this case) space, and this measures whether they face the same a similar direction and have similar length. The output is a scalar value so there's nothing to plot.
If you want to show the individual contribution of each component, you could do something like
contributions = [a==b for a, b in zip(l1, l2)]
plt.plot(list(range(len(contributions)), contributions)
but i'm still not sure that this makes sense.