I'm looking for a way to make dplyr'r full_join treat NA in the "by" columns as wildcards for joining, i.e. that a NA will match any value in the corresponding column of the other data frame and join the corresponding rows.
EXAMPLE: I have several data.frames in R, defining certain product characteristics, some of which depend on finer-grained features, others only on the high-level product class:
x = data.frame(class = c("A", "B", "B", "C"), flags = c(NA, "individual", "group", NA), interest = c(0.04, 0.03, 0.02, 0.05));x
y = data.frame(class = c("A", "A", "B", "B", "C"), flags = c("individual", "group", "individual", "group", NA), costs = c(0.05, 0.025, 0.03, 0.02, 0.01));y
> x
class flags interest
1 A <NA> 0.04
2 B individual 0.03
3 B group 0.02
4 C <NA> 0.05
> y
class flags costs
1 A individual 0.050
2 A group 0.025
3 B individual 0.030
4 B group 0.020
5 C <NA> 0.010
Now I want to join those two to create one table of product characteristics, but I want the NA in the 'flags' column to match any value in the flags column of the other data.frame, resulting in the following joint data.frame:
> expected
class flags interest costs
1 A individual 0.04 0.050
2 A group 0.04 0.025
3 B individual 0.03 0.030
4 B group 0.02 0.020
5 C <NA> 0.05 0.010
Unfortunately, full_join only matches by exact string match, so the in the flags column will not match any of the values ("group", "partner", "individual") of the flags column of the other data frame:
> actual = full_join(x,y, by=c("class","flags"));actual
class flags interest costs
1 A <NA> 0.04 NA
2 B individual 0.03 0.030
3 B group 0.02 0.020
4 C <NA> 0.05 0.010
5 A individual NA 0.050
6 A group NA 0.025
How can I make full_join treat NA as matching any string (rather than matching only other NAs)?
You can join only on the class
column and then filter:
library(dplyr)
x |>
full_join(y, by = "class", relationship = "many-to-many") |>
filter(flags.x == flags.y | is.na(flags.x) | is.na(flags.y)) |>
mutate(flags = coalesce(flags.x, flags.y)) |>
select(class, flags, interest, costs)
# class flags interest costs
# 1 A individual 0.04 0.050
# 2 A group 0.04 0.025
# 3 B individual 0.03 0.030
# 4 B group 0.02 0.020
# 5 C <NA> 0.05 0.010