Search code examples
rk-meansgeosphere

Set static centers for kmeans in R


I want to group a list of Long and Lats (my_long_lats) based on pre determined center points (my_center_Points).

When I run:-

k <- kmeans(as.matrix(my_long_lats), centers = as.matrix(my_center_Points))

k$centers does not equal my_center_Points.

I assume k-means has adjusted my center points to the optimal center. But what I need is for my_center_Points to not change and group my_long_lats around them.

In this link they talk about setting initial centers but How do I set centers that wont change once I run the k means? Or is there a better clustering algorithm for this?

I could even settle for minimizing the movement of the centers.

I still have a lot to learn in R, any help is really appreciated.


Solution

  • Here is the calculation using the geosphere library to properly compute the distance from latitude and longitude.

    The variable closestcenter is the result which identifies the closest center to each point.

    #define random data
    centers<-data.frame(x=c(44,44, 50, 50), y=c(44, 50, 44, 50))
    pts<-data.frame(x=runif(50, 40, 55), y=runif(50, 40, 55))
    
    #allocate space
    distance<-matrix(-1, nrow = length(pts$x), ncol= length(centers$x))
    
    library(geosphere)
    #calculate the dist matrix - the define centers to each point
    #columns represent centers and the rows are the data points
    dm<-apply(data.frame(1:length(centers$x)), 1, function(x){ replace(distance[,x], 1:length(pts$x), distGeo(centers[x,], pts))})
    
    #find the column with the smallest distance
    closestcenter<-apply(dm, 1, which.min)
    
    #color code the original data for verification
    colors<-c("black", "red", "blue", "green")
    plot(pts , col=colors[closestcenter], pch=19) 
    

    enter image description here