sampling based on specified column values in R

I have a data like this, where Average is the average of X, Y, and Z.

head(df)
ID  X   Y   Z   Average
A   2   2   5   3
A   4   3   2   3
A   4   3   2   3
B   5   3   1   3
B   3   4   2   3
B   1   5   3   3
C   5   3   1   3
C   2   3   4   3
C   5   3   1   3
D   2   3   4   3
D   3   2   4   3
D   3   2   4   3
E   5   3   1   3
E   4   3   2   3
E   3   4   2   3

To reproduce this, we can use

df <- data.frame(ID = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E"),
                     X = c(2L, 4L, 4L, 5L, 3L,1L, 5L, 2L, 5L, 2L, 3L, 3L, 5L, 4L, 3L),
                     Y = c(2L, 3L, 3L, 3L,4L, 5L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 4L), 
                     Z = c(5L, 2L, 2L,1L, 2L, 3L, 1L, 4L, 1L, 4L, 4L, 4L, 1L, 2L, 2L), 
                     Average = c(3L,3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L))

From this, I want to extract one observation per ID such that we don't get same (as much as is possible) values of the combination of X, Y, and Z. I tried

library(dplyr)
df %>% sample_n(size = nrow(.), replace = FALSE) %>% distinct(ID, .keep_all = T)

But, on a larger dataset, I see too many repetitions of the combination of X, Y, Z. To the extent possible, I need the output with equal or close to equal representation of cases (i.e. the combination of X, Y, Y) like this:

   ID   X   Y   Z   Average
    A   2   2   5   3
    B   5   3   1   3
    C   2   3   4   3
    D   3   2   4   3
    E   4   3   2   3

Solution

This seems dubious, but try this:

library(dplyr)
df %>% add_count(X, Y, Z) %>%
    group_by(ID) %>%
    top_n(-1, n) %>%
    arrange(runif(n)) %>%
    select(-n) %>%
    slice(1)
# # A tibble: 5 x 5
# # Groups:   ID [5]
#       ID     X     Y     Z Average
#   <fctr> <int> <int> <int>   <int>
# 1      A     2     2     5       3
# 2      B     1     5     3       3
# 3      C     2     3     4       3
# 4      D     3     2     4       3
# 5      E     3     4     2       3

This picks the least common XYZ combo from each ID, and in case of a tie chooses randomly. Extremely common XYZ combos may be missing entirely.