I have a dataframe o
which contains incorrect data. I have another dataframe r
with the correct data. I want to replace the subset of o
(both rows and columns) with r
. I'm seeking a solution that scales to many variables / columns. (I do have a vector rv
with the column names of the variables that need replacing)
o <- tibble(id = c(2, 3, 4, 7, 10, 2), aa = letters[1:6], bb = LETTERS[2:7], cc = 11:16)
r <- tibble(id = c(2,4), cc = c(111, 113), aa = c("aa", "cc"))
rv <- c("aa", "cc")
> o
# A tibble: 6 × 4
id aa bb cc
<dbl> <chr> <chr> <int>
1 2 a B 11
2 3 b C 12
3 4 c D 13
4 7 d E 14
5 10 e F 15
6 2 f G 16
> r
# A tibble: 2 × 3
id cc aa
<dbl> <dbl> <chr>
1 2 111 aa
2 4 113 cc
I want to end up with (I don't care about the order of rows or columns):
> o
# A tibble: 6 × 4
id aa bb cc
<dbl> <chr> <chr> <int>
1 2 aa B 111
2 3 b C 12
3 4 cc D 113
4 7 d E 14
5 10 e F 15
6 2 aa G 111
You can assume that all ids in r
are in o
, but you cannot assume that columns or rows are the in the same order, and there are duplicated id
in o
(not in r
), where the relevant columns should also be overwritten.
I tried base R, but the duplicate id
in o
stumps it:
o[o$id %in% r$id, rv] <- r
I thought of dplyr
using left_join
but don't know how to overwrite rather than attach.
My question is different from similar ones because there are several columns to replace, but not the entire dataset -- on which I had found useful answers.
If the question is looking for a base R solution that fixes up the attempt in the question then note that we can only write o[...] <- r with just r
on the right hand side if it is sufficient to replace just the first occurrence of each id in o. If that were ok then we could fix it like this:
# only replaces first instance of each r id in o
o[match(r$id, o$id), names(r)] <- r
With duplicate id's in o, as in the example data, we will need a loop; however, it is still only one line of code:
# replace all instances of each id so can be used with example data
for(id in r$id) o[o$id == id, names(r)] <- r[r$id == id, ]
An equivalent alternative to the explicit loop is to use Reduce like this:
upd <- function(o, id) {
o[o$id == id, names(r)] <- r[r$id == id, ]
o
}
Reduce(upd, r$id, init = o)