I am attempting to filter rows from a dataframe (df
) using the contents in specific columns (col1
, col2
, and col3
) in the next row.
This question got close but only uses one column to lag
Most of the posts showing how to filter using lag/lead have columns which are numbers, in my case they are all text.
df <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2",
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1",
"Jim", "Run", "optionB", "col1×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
I would like to filter rows if the next row (except the Effect
column) is identical.
The final dataframe would look like this
df_result <- tibble::tribble(
~col1, ~col2, ~col3, ~Effect,
"Jim", "Walk", "optionA", "col1×col2×col2",
"Jim", "Run", "optionB", "col1×col2×col2",
"Joe", "Walk", "optionA", "col1×col2",
"Joe", "Run", "optionB", "col1×col2×col2"
)
Does anyone have any suggestions? If possible I would like to use the tidyverse to get the solution.
We can use distinct
library(dplyr)
df %>%
slice(rev(row_number())) %>%
distinct(across(col1:col3), .keep_all = TRUE)
-ouptut
# A tibble: 4 x 4
col1 col2 col3 Effect
<chr> <chr> <chr> <chr>
1 Joe Run optionB col1×col2×col2
2 Joe Walk optionA col1×col2
3 Jim Run optionB col1×col2×col2
4 Jim Walk optionA col1×col2×col2
Or using nchar
df %>%
group_by(across(col1:col3)) %>%
slice(which.max(nchar(Effect))) %>%
ungroup