Search code examples
rdplyrinner-join

Joining data frames with recurring variable


The problem I'm having is that inner_join() creates a new row with all the associated values.

An example:

zip_code <- c("1000", "1000", "1001")
village <- c("village_x", "village_y", "village_z")
villages <- data.frame(cbind(zip_code, village))

zip_code <- c("1000", "1000", "1001")
case <- c("case1", "case2", "case3")
cases <- data.frame(cbind(zip_code, case))

data <- inner_join(villages, cases, by="zip_code")

This solution increases the number of cases, as there are several villages with the same ZIP code.

How can I make it so that villages with the same ZIP code are in the same cell?

Or that the merge only pairs the cases with the first found value?


Solution

  • @ConnerSexton's solution worked:

    data <- inner_join(villages, cases, by="zip_code") %>% group_by(zip_code, case) %>% summarize(village = paste(village, collapse = ', '), .groups = 'drop')
    

    Thanks a lot!