Here's my dataframes for reproducing the problem:
df1 <- data.frame(id=c("","","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420"),
text=c("asdf","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten"),
response=c("y","y","y","n","n","y","y","n","n","y"))
df2 <- data.frame(id=c("","","634703802","634927873","635812953","636004739","636101211","636157799","636263106","636752420","636809222","2004722036","2004894388","2005045755","2005535472","2005630542","2005788781","2005809679","2005838317","2005866692"),
text=c("asdf_xyz","cat","dog","mouse","elephant","goose","rat","mice","kitty","kitten","tiger_xyz","lion","leopard","ostrich","kangaroo","platypus","fish","reptile","mammals","amphibians_xyz"),
volume=c("1234","432","324","333","2223","412346","7456","3456","2345","2345","6","345","23","2","4778","234","8675","3459","8","9"))
What I have tried:
joined_df1 <- power_full_join(df1, df2, by = c("id"),
conflict = rw ~ ifelse(.x != .y,
NA_integer_,
.x), na_matches = "never")
which gives
id response volume text
1 <NA> y <NA> <NA>
2 <NA> y <NA> <NA>
3 634703802 y 324 dog
4 634927873 n 333 mouse
5 635812953 n 2223 elephant
6 636004739 y 412346 goose
7 636101211 y 7456 rat
8 636157799 n 3456 mice
9 636263106 n 2345 kitty
10 636752420 y 2345 kitten
11 <NA> <NA> 1234 <NA>
12 <NA> <NA> 432 <NA>
13 636809222 <NA> 6 <NA>
14 2004722036 <NA> 345 <NA>
15 2004894388 <NA> 23 <NA>
16 2005045755 <NA> 2 <NA>
17 2005535472 <NA> 4778 <NA>
18 2005630542 <NA> 234 <NA>
19 2005788781 <NA> 8675 <NA>
20 2005809679 <NA> 3459 <NA>
21 2005838317 <NA> 8 <NA>
22 2005866692 <NA> 9 <NA>
but I do not want to join for if NA is present and the expected output I want is:
id response volume text
3 634703802 y 324 dog
4 634927873 n 333 mouse
5 635812953 n 2223 elephant
6 636004739 y 412346 goose
7 636101211 y 7456 rat
8 636157799 n 3456 mice
9 636263106 n 2345 kitty
10 636752420 y 2345 kitten
13 636809222 <NA> 6 <NA>
14 2004722036 <NA> 345 <NA>
15 2004894388 <NA> 23 <NA>
16 2005045755 <NA> 2 <NA>
17 2005535472 <NA> 4778 <NA>
18 2005630542 <NA> 234 <NA>
19 2005788781 <NA> 8675 <NA>
20 2005809679 <NA> 3459 <NA>
21 2005838317 <NA> 8 <NA>
22 2005866692 <NA> 9 <NA>
don't worry about the left numbering. Can someone help pls? Thanks.
We could convert the blanks (""
) to NA
, filter
all NA
elements from 'id' in both datasets and then do the power_full_join
library(dplyr)
library(powerjoin)
df1 %>%
na_if("") %>%
filter(complete.cases(id)) %>%
power_full_join(df2 %>%
na_if("") %>%
filter(complete.cases(id)), by = c("id"),
conflict = rw ~ ifelse(.x != .y,
NA_integer_, .x), na_matches = "never")