Suppose I have two dataframes:
df1 <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
City = c("New York", "San Francisco", "Los Angeles"),
Score = c(95, 88, 92)
)
df2 <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
City = c("New York", "San Francisco", "Los Angeles"),
Score = c(95, 100, 92)
)
And I've noticed that Bob has changed his score (come on Bob!).
I want to find the differences between the two dataframes and when there is a difference, keep the values across all possibly different columns from the 1st dataframe.
I'm able to find the differences:
differences <- df1 != df2
But I'm unsure where to go from here. Ultimately, I need a dataframe where anything that is the same (or FALSE in the differences df) is NA and only the things that are different are in the dataframe. It would look something like:
fixed_df <- data.frame(
Name = c(NA_character_, NA_character_, NA_character_),
Age = c(NA_real_, NA_real_, NA_real_),
City = c(NA_character_, NA_character_, NA_character_),
Score = c(NA_real_, 100, NA_real_)
)
Thanks for your help!
use the forgotten function is.na<-
`is.na<-`(df2, df1 == df2)
Name Age City Score
1 <NA> NA <NA> NA
2 <NA> NA <NA> 100
3 <NA> NA <NA> NA
you could use is.na
but does an inplace replacement. Therefore we need a temp dataset so as to maintain the original:
temp <- df2
is.na(temp) <- df2 == df1
temp
Name Age City Score
1 <NA> NA <NA> NA
2 <NA> NA <NA> 100
3 <NA> NA <NA> NA