Please note that the order of the data and the VALUE column are different from my prior question. How do I remove offsetting rows in a tibble? The accepted answer for the question works for the data set I provided, however changing values causes the answer to fail.
I am trying to remove rows that have offsetting values.
library(dplyr)
a <- c(1, 1, 1, 1, 2, 2, 2, 2,2,2)
b <- c("a", "b", "b", "b", "c", "c","c", "d", "d", "d")
d <- c(10, 10, -10, 50, 20, -20, 60, 30, -30, 70)
o <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
df <- tibble(ID = a, SEQ = b, VALUE = d, OTHER = o)
Generates this ordered table that is grouped by ID and SEQ.
> df
# A tibble: 10 x 4
ID SEQ VALUE OTHER
<dbl> <chr> <dbl> <chr>
1 1 a 10 A
2 1 b -10 B
3 1 b 10 C
4 1 b 50 D
5 2 c -20 E
6 2 c 20 F
7 2 c 60 G
8 2 d -30 H
9 2 d 30 I
10 2 d 70 J
I want to drop the row pairs (2,3), (5,6), (8,9) because VALUE negates the VALUE in the matching previous row.
I want the resulting table to be
> df2
# A tibble: 4 x 4
ID SEQ VALUE OTHER
<dbl> <chr> <dbl> <chr>
1 1 a 10 A
2 1 b 50 D
3 2 c 60 G
4 2 d 70 J
I know that I can't use group_by %>% summarize
, because I need to keep the value that is in OTHER. I've looked at the dplyr::lag()
function but I don't see how that can help. I believe that I could loop through the table with some type of for each
loop and generate a logical vector that can be used to drop the rows, but I was hoping for a more elegant solution.
Unless I completely misunderstood your question, does this do the job?
idx <- which(diff(cumsum(df$VALUE)) < 0);
df[-c(idx, idx + 1), ];
## A tibble: 4 x 4
# ID SEQ VALUE OTHER
# <dbl> <chr> <dbl> <chr>
#1 1 a 10 A
#2 1 b 50 D
#3 2 c 60 G
#4 2 d 70 J