I have two data frames that look like these:
df1 <- data.frame("Year" = c(1991, 1997, 2002, 2009, 2016, 2024),
"Name" = c("Amy", "Ben", "Chad", "Dan", "Emily", "Frank"),
"Gender" = c("F", "M", "M", "M", "F", "M"))
df2 <- data.frame("Year" = c(1997, 1997, 1997, 2009, 2009, 2016, 2016),
"Name" = c("Ben", "Ben", "Ben", "Dan", "Dan", "Emily", "Emily"),
"Gender" = c("M", "M", "M", "M", "M", "F", "F"),
"Year_new" = c(1998, 2002, 2004, 2009, 2010, 2017, 2020),
"Name_new" = c("Bella", "Brandon", "Brittany", "David", "Dylan", "Emma", "Ethan"),
"Gender_new" = c("F", "M", "F", "M", "M", "F", "M"))
I want to merge df1
and df2
to replace Year
, Name
and Gender
that overlap between df1
and df2
into Year_new
, Name_new
and Gender_new
, so that the result will look like this:
df3 <- data.frame("Year" = c(1991, 1998, 2002, 2004, 2002, 2009, 2010, 2017, 2020, 2024),
"Name" = c("Amy", "Bella", "Brandon", "Brittany", "Chad", "David", "Dylan", "Emma", "Ethan", "Frank"),
"Gender" = c("F", "F", "M", "F", "M", "M", "M", "F", "M", "M"))
I tried doing this by using full_join
, coalesce
and rename
like below.
df3 <- full_join(df1, df2) %>%
mutate("Year_new" = coalesce(Year, Year_new),
"Name_new" = coalesce(Name, Name_new),
"Gender_new" = coalesce(Gender, Gender_new)) %>%
select(Year_new:Gender_new) %>%
rename(Year = Year_new, Name = Name_new, Gender = Gender_new)
But this way, coalesce
is overwriting the values I'd like to keep from df2
. I also feel like this might not be the easiest/most efficient way. Is there a function other than full_join
or coalesce
that would work in this case?
Bind what's only in df1
with what's new in df2
(properly renamed):
library(dplyr)
# You don't really need this "column-selection" helper but..
aux_cols <- c("Year", "Name", "Gender")
# Bind what's only in `df1`..
# ..with what´s "new" in `df2` (renamed withou "_new)
new_df <- bind_rows(
anti_join(df1, df2, by = aux_cols),
transmute(
semi_join(df2, df1, by = aux_cols),
across(ends_with("_new"), .names = "{str_remove({.col}, '_new')}")))
The output:
# `arrange` the output by `Year and `Name`
> arrange(new_df, Year, Name)
Year Name Gender
1 1991 Amy F
2 1998 Bella F
3 2002 Brandon M
4 2002 Chad M
5 2004 Brittany F
6 2009 David M
7 2010 Dylan M
8 2017 Emma F
9 2020 Ethan M
10 2024 Frank M
Edit. If you're not into tidyverse
that much, you can try this:
aux_cols <- c("Year", "Name", "Gender")
df3 <- merge(df1, df2, all.x = TRUE)
aux_rows <- complete.cases(df3)
aux_cols <- c("Year", "Name", "Gender")
df3[aux_rows, aux_cols] <- df3[aux_rows, paste0(aux_cols, "_new")]
df3 <- df3[do.call(order, df3[aux_cols]), aux_cols]
The output:
> df3
Year Name Gender
1 1991 Amy F
2 1998 Bella F
3 2002 Brandon M
5 2002 Chad M
4 2004 Brittany F
6 2009 David M
7 2010 Dylan M
8 2017 Emma F
9 2020 Ethan M
10 2024 Frank M