I know that scipy.cluster.hierarchy focused on dealing with the distance matrix. But now I have a similarity matrix... After I plot it by using Dendrogram, something weird just happens. Here is the code:
similarityMatrix = np.array(([1,0.75,0.75,0,0,0,0],
[0.75,1,1,0.25,0,0,0],
[0.75,1,1,0.25,0,0,0],
[0,0.25,0.25,1,0.25,0.25,0],
[0,0,0,0.25,1,1,0.75],
[0,0,0,0.25,1,1,0.75],
[0,0,0,0,0.75,0.75,1]))
here is the linkage method
Z_sim = sch.linkage(similarityMatrix)
plt.figure(1)
plt.title('similarity')
sch.dendrogram(
Z_sim,
labels=['1','2','3','4','5','6','7']
)
plt.show()
But here is the outcome:
My question is:
Thank you very much for your help!
linkage
expects "distances", not "similarities". To convert your matrix to something like a distance matrix, you can subtract it from 1:
dist = 1 - similarityMatrix
linkage
does not accept a square distance matrix. It expects the distance data to be in "condensed" form. You can get that using scipy.spatial.distance.squareform
:
from scipy.spatial.distance import squareform
dist = 1 - similarityMatrix
condensed_dist = squareform(dist)
Z_sim = sch.linkage(condensed_dist)
(When you pass a two-dimensional array with shape (m, n) to linkage
, it treats the rows as points in n-dimensional space, and computes the distances internally.)