Search code examples
google-mapscluster-analysisgeospatiallatitude-longitudehierarchical-clustering

How to cluster latitude-longitude data based on fixed radius from centroid as the only constraint?


I have around 200k latitude & longitude data points. How can I cluster them so that each clusters have latitude & longitude points strictly within radius = 1 km from centroid only?

I tried leadercluster algorithm/package in R but eventhough I specify radius =1 km its not strictly enforcing it i.e. its give clusters with lot of point say 5 - 10 kms from cluster centroid also within the same cluster. So its not meeting my requirement.

Number of points in a cluster can vary & its not problem.

Is there a way to enforce the strict radius constraint in heirarchical or another clustering algorithm? I am looking for the steps & implementation in R/python. I tried searching in stackoverflow but couldn't find a solution in r/python.

How to visualize cluster centroids in google maps after the clustering in done?

EDIT

Parameters I am using in ELKI. Please verify enter image description here


Solution

  • This is not so much a clustering, but a set cover type of problem. At least if you are looking for a good cover. A clustering algorithm is about finding structure in your data; but you are looking for some forced quantization.

    Anyway, here are two strategies you can try e.g. in ELKI:

    • Canopy preclustering with T1=T2=your radius. This should yield a greedy approximation to the cover scenario.
    • Complete linkage hierarchical agglomerative clustering, cut at the desired height. This is fairly expensive (O(n^3)). Any two points in the same cluster have at most this distance, so this is a bit stricter than your requirement.

    Beware that you should be using haversine ("geo") distances, not Euclidean!