Search code examples
rdplyrpurrr

Anti-join dataframes in list to original prior dataframe using `accumulate`


I have a list of dataframes, a, b, and c. I want to end up with a list where a will not be changed, b will contain only those rows not in a, and c will contain only those rows not in b.

# Sample data
a <- data.frame(num = 1:4, let = letters[1:4])
b <- data.frame(num = 2:6, let = letters[2:6])
c <- data.frame(num = 3:8, let = letters[3:8])

dfs <- list(a, b, c)

The part that's tripping me up is that I need to do the anti_join on the original prior dataframe, not the one created after the anti-join to its prior dataframe. My instinct is to use accumulate from purrr to do this, but I can't figure out how to get it to work on the original prior dataframe.

dfs |> 
  accumulate(~anti_join(.y, .x))

[[1]]
  num let
1   1   a
2   2   b
3   3   c
4   4   d

[[2]]
  num let
1   5   e
2   6   f

[[3]]
  num let
1   4   d
2   7   g
3   8   h

Since this joins on the already joined prior dataframe, I have 4 d in the 3rd df which I do not want.

I have tried .dir = "backward" as a way to use the original dfs for joining, but that is not doing what I think it should be doing:

dfs |> 
   accumulate(~anti_join(.y, .x), .dir = "backward")

[[1]]
  num let
1   7   g
2   8   h

[[2]]
  num let
1   7   g
2   8   h

[[3]]
  num let
1   4   d
2   5   e
3   6   f
4   7   g
5   8   h

Is there an way to set the arguments for accumulate so it can do this, or will I need a different approach? I prefer purrr/tidyverse if possible but am open to anything that does what I need.

Expected output:

[[1]]
  num let
1   1   a
2   2   b
3   3   c
4   4   d

[[2]]
  num let
1   5   e
2   6   f

[[3]]
  num let
1   7   g
2   8   h

Solution

  • You can use accumulate2 instead to implement a rolling anti_join.

    accumulate2(dfs, head(dfs, -1), ~ anti_join(..2, ..3))
    
    [[1]]
      num let
    1   1   a
    2   2   b
    3   3   c
    4   4   d
    
    [[2]]
      num let
    1   5   e
    2   6   f
    
    [[3]]
      num let
    1   7   g
    2   8   h