I have a list of dataframes, a
, b
, and c
. I want to end up with a list where a
will not be changed, b
will contain only those rows not in a
, and c
will contain only those rows not in b
.
# Sample data
a <- data.frame(num = 1:4, let = letters[1:4])
b <- data.frame(num = 2:6, let = letters[2:6])
c <- data.frame(num = 3:8, let = letters[3:8])
dfs <- list(a, b, c)
The part that's tripping me up is that I need to do the anti_join on the original prior dataframe, not the one created after the anti-join to its prior dataframe. My instinct is to use accumulate
from purrr
to do this, but I can't figure out how to get it to work on the original prior dataframe.
dfs |>
accumulate(~anti_join(.y, .x))
[[1]]
num let
1 1 a
2 2 b
3 3 c
4 4 d
[[2]]
num let
1 5 e
2 6 f
[[3]]
num let
1 4 d
2 7 g
3 8 h
Since this joins on the already joined prior dataframe, I have 4 d
in the 3rd df which I do not want.
I have tried .dir = "backward"
as a way to use the original dfs for joining, but that is not doing what I think it should be doing:
dfs |>
accumulate(~anti_join(.y, .x), .dir = "backward")
[[1]]
num let
1 7 g
2 8 h
[[2]]
num let
1 7 g
2 8 h
[[3]]
num let
1 4 d
2 5 e
3 6 f
4 7 g
5 8 h
Is there an way to set the arguments for accumulate
so it can do this, or will I need a different approach? I prefer purrr/tidyverse if possible but am open to anything that does what I need.
Expected output:
[[1]]
num let
1 1 a
2 2 b
3 3 c
4 4 d
[[2]]
num let
1 5 e
2 6 f
[[3]]
num let
1 7 g
2 8 h
You can use accumulate2
instead to implement a rolling anti_join
.
accumulate2(dfs, head(dfs, -1), ~ anti_join(..2, ..3))
[[1]]
num let
1 1 a
2 2 b
3 3 c
4 4 d
[[2]]
num let
1 5 e
2 6 f
[[3]]
num let
1 7 g
2 8 h