Search code examples
rdplyrdatatable

Combine across groups within variable


I need to create all combinations across groups in a dataframe.

Let's say I start with:

df <- tibble(
 j = c("x", "x", "y", "y", "z", "z"),
 k = c(100, 300, 20, 60, 40, 35),
 ind = c(0 ,0, 0, 0, 1, 1)
)

What I need is something like this:

want <- tibble(
 j_k = c("100_20","100_60","300_20","300_60")
)

I've reviewed similar questions that point to things like expand() and combn(). The problem is most posts are about creating all combinations across variables, whereas I need combinations across groups within variables, with the option to reject based on an indicator (ind in my example), and with control over the delimiter between the combined values.

Okay to change data shape to make this easier.

Thanks for any help.


Solution

  • With expand.grid and split in base R:

    df_new <- with(df[df$ind == 0, ], expand.grid(split(k, j)))
    do.call(paste, c(df_new, sep = "_"))
    #[1] "100_20" "300_20" "100_60" "300_60"