Search code examples
pythonsimilarityoutliers

Identify outliers from similarity matrix


I have a numpy matrix of dimension nxn where the [i,j] element is the similarity score (0-1 with 1 being identical and 0 being opposite) between two objects (in this case I'm analyzing color palettes, so it's the similarity score between color palette i and color palette j). I would like to determine which of the objects are "outliers" (using the definition loose here). The closest I've been able to think of is using something like DBSCAN and determining which objects don't seem to fit. Is there a better way of going about this?


Solution

  • I'd go for Markov clustering.

    Essentially, the algorithm is having a random walk on a graph.

    Random walks are super easy to implement if you have the proximity matrix. The algorithm is roughly:

    1. Normalize the matrix.
    2. Raise it to a large power (M**n).
    3. Look at the strength of the connections between nodes.