I want to filter grouped data using either a.) a partial string specified in another column, or if easier, b.) a partial string which I specify in the code.
I have the following data frame:
df <- structure(list(
sen = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
trial = c("standard", "standard", "standard", "standard", "standard", "silence", "silence", "silence", "silence", "silence", "deviant", "deviant", "deviant", "deviant", "deviant","standard", "standard", "standard", "standard", "standard", "silence", "silence", "silence", "silence", "silence", "deviant", "deviant", "deviant", "deviant", "deviant"),
ppt = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3),
ia_label = c("The_TW1", "cow_TW2", "jumped_TW3", "the_TW4", "gate_TW5","The_TW1", "cow_TW2", "jumped_TW3", "the_TW4", "gate_TW5", "The_TW1", "cow_TW2", "jumped_TW3", "the_TW4", "gate_TW5", "The_TW1", "cow_TW2", "jumped_TW3", "the_TW4", "gate_TW5","The_TW1", "cow_TW2", "jumped_TW3", "the_TW4", "gate_TW5", "The_TW1", "cow_TW2", "jumped_TW3", "the_TW4", "gate_TW5"),
target_pos = c("0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "TW3", "TW3", "TW3", "TW3", "TW3", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "TW4", "TW4", "TW4", "TW4", "TW4")),
.Names = c("sen","trial","ppt","ia_label","target_pos"),
row.names = c(NA, -30L),
class = "data.frame")
sen | trial | ppt | ia_label | target_pos |
---|---|---|---|---|
1 | standard | 1 | The_TW1 | 0 |
1 | standard | 1 | cow_TW2 | 0 |
1 | standard | 1 | jumped_TW3 | 0 |
1 | standard | 1 | the_TW4 | 0 |
1 | standard | 1 | gate_TW5 | 0 |
1 | silence | 2 | The_TW1 | 0 |
1 | silence | 2 | cow_TW2 | 0 |
1 | silence | 2 | jumped_TW3 | 0 |
1 | silence | 2 | the_TW4 | 0 |
1 | silence | 2 | gate_TW5 | 0 |
1 | deviant | 3 | The_TW1 | TW3 |
1 | deviant | 3 | cow_TW2 | TW3 |
1 | deviant | 3 | jumped_TW3 | TW3 |
1 | deviant | 3 | the_TW4 | TW3 |
1 | deviant | 3 | gate_TW5 | TW3 |
2 | standard | 1 | The_TW1 | 0 |
2 | standard | 1 | cow_TW2 | 0 |
2 | standard | 1 | jumped_TW3 | 0 |
2 | standard | 1 | the_TW4 | 0 |
2 | standard | 1 | gate_TW5 | 0 |
2 | silence | 2 | The_TW1 | 0 |
2 | silence | 2 | cow_TW2 | 0 |
2 | silence | 2 | jumped_TW3 | 0 |
2 | silence | 2 | the_TW4 | 0 |
2 | silence | 2 | gate_TW5 | 0 |
2 | deviant | 3 | The_TW1 | TW4 |
2 | deviant | 3 | cow_TW2 | TW4 |
2 | deviant | 3 | jumped_TW3 | TW4 |
2 | deviant | 3 | the_TW4 | TW4 |
2 | deviant | 3 | gate_TW5 | TW4 |
and I want to filter the data frame by 'ia_label's that contain the string specified in the target_pos for deviant conditions (either tw3 or tw4) - but I want to group this by 'sen' - so for all of sen = 1, I want to keep only the rows with ia_label containing _TW3, and for sen = 2 I want to keep only the rows with ia_label containing _TW4:
sen | trial | ppt | ia_label | target_pos |
---|---|---|---|---|
1 | standard | 1 | jumped_TW3 | 0 |
1 | silence | 2 | jumped_TW3 | 0 |
1 | deviant | 3 | jumped_TW3 | TW3 |
2 | standard | 1 | the_TW4 | 0 |
2 | silence | 2 | the_TW4 | 0 |
2 | deviant | 3 | the_TW4 | TW4 |
I only have a small number of different strings that I need to filter by, so I don't mind running this manually by specifying the partial string I want to filter 'ia_label' by, if it isn't possible to filter each group by the partial string specified within the 'target_pos' column.
I have tried using the following code using group_by, filter and grepl but I receive the error below:
library(dplyr)
Df2 <- df %>%
group_by(sen) %>%
filter(df, grepl("TW3",ia_label))
Output:
Error: Problem with filter()
input ..1
.
x Input ..1
must be of size 15 or 1, not size 30.
i Input ..1
is df
.
i The error occurred in group 1: sen = 1.
Run rlang::last_error()
to see where the error occurred.
The error is because you are both piping df
into filter
and specifying it inside filter
. It will avoid the error if you change filter(df, grepl(...
to filter(grepl(...
.
df %>%
group_by(sen) %>%
filter(grepl("TW3", ia_label))
To do this for the first target_pos
value corresponding to a trial == "deviant"
by group, do this:
df %>%
group_by(sen) %>%
filter(grepl(
pattern = first(target_pos[trial == "deviant"]),
x = ia_label
))
# # A tibble: 6 × 5
# # Groups: sen [2]
# sen trial ppt ia_label target_pos
# <dbl> <chr> <dbl> <chr> <chr>
# 1 1 standard 1 jumped_TW3 0
# 2 1 silence 2 jumped_TW3 0
# 3 1 deviant 3 jumped_TW3 TW3
# 4 2 standard 1 the_TW4 0
# 5 2 silence 2 the_TW4 0
# 6 2 deviant 3 the_TW4 TW4