Search code examples
pythongeolocationcluster-analysislatitude-longitudedistance-matrix

Finding Clusters with Max Intra-Distance Based on Geo Co-ordinates


I have DataSet that contains Lat long data.

('ID','Latitude','Longitude')

('A0001',19.222,71.555)

Using this data I have computed the distance Matrix, where M[i][j] is the distance between ID:i and ID:j.

The distance is computed using the below code:

geopy.distance.vincenty((a,b),(c,d)).miles

Is there a best way to find clusters that are within the X miles of radius.

Most of the current clusters like "DBSCAN" K-Means provide options for minimum distance and minimum samples, however I am looking for clustering method which provides maximum distance.

Secondly, I am ok not to calculate distance matrix, if thats not required.


Solution

  • Do complete linkage hierarchical clustering.

    If you cut the tree at the distance x, any two points in the same cluster will have a distance at most x. It's not optimal (because that would be NP complete) but good enough, usually.