Search code examples
rjoinsemi-join

Merging two data frames horizontally by ID and keep only matches from the second one


I have two data frames which I want to merge horizontally:

dat_a

  a b c
1 1 1 A
2 2 1 A
3 3 1 B
4 4 1 B


dat_b

  a b c
1 3 1 C
2 3 1 C
3 3 1 D
4 4 1 D

I want to only keep those rows from dat_a which have a match in dat_b for columns a and b.

So the final result should look like this:

dat_c
   a b c
1 3 1 B
2 4 1 B
3 3 1 C
4 3 1 C
5 3 1 D
6 4 1 D

Solution

  • try semi_join from the dplyr package.

    If you want only the rows dat_a which have a match in dat_b you can use:

    library(dplyr)
    dat_a %>% semi_join(dat_b, by = c("a", "b"))
    

    If - like in your desired output - you want all columns of dat_a which have a match in dat_b and all columns in dat_b which habe match in dat_atry:

    dat_a %>% semi_join(dat_b, by = c("a", "b")) 
      %>% bind_rows(dat_b %>% semi_joim(dat_a, by = c("a", "b")))