R - Clustering (K-means) within groups

I need help clustering my data within assigned groups...

I have the following dataframe:

# Generate data frame
set.seed(1)
df1 <- data.frame(
  start.x = sample(1:20),
  start.y = sample(1:20),
  end.x = sample(1:20),
  end.y = sample(1:20)
)

I've used K-means to group it:

# Group using K-means
groups <- kmeans(df1[,c('start.x', 'start.y', 'end.x', 'end.y')], 4)
df1$group <- as.factor(groups$cluster)

Now I want to use K-means again to cluster it within the groups I've just created and assign the results to a new column in the dataframe.

Does anyone know how to do this or have a shorter way to complete both steps simultaneously.

Thanks...

Solution

We can use the first group to split the data and apply kmeans to only subset of data. Make sure to use correct number of k though because it depends on how the first group is created.

library(dplyr)
library(purrr)

df1 %>%
  group_split(group = kmeans(.[,c('start.x', 'start.y', 'end.x', 'end.y')], 
                             4)$cluster) %>%
   map_df(~.x %>% mutate(new_group = 
     kmeans(.x[,c('start.x', 'start.y', 'end.x', 'end.y')], 2)$cluster))

In base R, you could use by which does split, apply and combine operation.

df1$new_group <- unlist(by(df1, df1$group, function(x) 
        kmeans(x[,c('start.x', 'start.y', 'end.x', 'end.y')], 2)$cluster))