Search code examples
pythonmachine-learningplotchartssimilarity

I have plots of points that I extract from an image. How can I determine a similarity measure between two different plots?


Each point has an x, y, and size.

For example these should result in similar:

Plot 1-A:

Plot 1-A

Plot 1-B:

Plot 1-B

And these should not result in similar:

Plot 2-A:

Plot 2-A

Plot 3-A:

Plot 3-A

Are there any algorithms or ways to determine similarity of the plots.

I tried creating a feature vector for each graph like number of points in each quadrant, the largest point size, distance between the two largest points and doing cosine similarity. But I keep getting high similarities for non matching graphs. I was looking into creating a ML model for this and was looking into siamese model, but cannot get it to train correctly.


Solution

  • you could look at converting your data into a matrix that covers the entire plain you have on the x,y space with each point representing a part of the matrix, large data points will then be represented automatically. A 1 represents whether there's something in that cell, a 0 representing white space

    For example, If you have a 3X3 space, having a matrix space like below:

    101
    010
    111
    

    and

    111
    011
    101 
    

    You can do a cosine similarity on both these matrices to evaluate the closeness to each other. There might be some libraries in OpenCV that could help you with this, my answer covers the principle you would need to aim to achieve. The main issue you have is representing your data in a way that's conducive for comparisons.