I want to group a list of Long and Lats (my_long_lats) based on pre determined center points (my_center_Points).
When I run:-
k <- kmeans(as.matrix(my_long_lats), centers = as.matrix(my_center_Points))
k$centers
does not equal my_center_Points.
I assume k-means has adjusted my center points to the optimal center. But what I need is for my_center_Points to not change and group my_long_lats around them.
In this link they talk about setting initial centers but How do I set centers that wont change once I run the k means? Or is there a better clustering algorithm for this?
I could even settle for minimizing the movement of the centers.
I still have a lot to learn in R, any help is really appreciated.
Here is the calculation using the geosphere
library to properly compute the distance from latitude and longitude.
The variable closestcenter
is the result which identifies the closest center to each point.
#define random data
centers<-data.frame(x=c(44,44, 50, 50), y=c(44, 50, 44, 50))
pts<-data.frame(x=runif(50, 40, 55), y=runif(50, 40, 55))
#allocate space
distance<-matrix(-1, nrow = length(pts$x), ncol= length(centers$x))
library(geosphere)
#calculate the dist matrix - the define centers to each point
#columns represent centers and the rows are the data points
dm<-apply(data.frame(1:length(centers$x)), 1, function(x){ replace(distance[,x], 1:length(pts$x), distGeo(centers[x,], pts))})
#find the column with the smallest distance
closestcenter<-apply(dm, 1, which.min)
#color code the original data for verification
colors<-c("black", "red", "blue", "green")
plot(pts , col=colors[closestcenter], pch=19)