I have data over individuals that looks like this:
df=data.frame(Col1=c("A","A","B","B"),Col2=c("A50","C50","B60","A70"),Col3=c("A40","A50","A50","A70"))
criteria="A50"
Col1 Col2 Col3
A A50 A40
A C50 A50
B A50 A50
B A70 A70
I want to pick each individual in Col1
that has the criteria
fulfilled (A50
) in any of the columns in at two different observations. That is, individual A
will be selected since he has A50
in two different observations. Individual B
will however not be selected since he only has A50
at in one observation, even if he became A50
two times during that observation.
The problem is an extension to this one: Subset multiple columns in R with multiple matches
Try using this with dplyr
1.0.0
library(dplyr)
cols <- c('Col2', 'Col3')
df %>%
group_by(Col1) %>%
filter(sum(colSums(cur_data()[cols] == criteria) >= 1) > 1)
# Col1 Col2 Col3
# <chr> <chr> <chr>
#1 A A50 A40
#2 A C50 A50
cur_data()[cols]
selects only cols
column, colSums
counts number of matches in each column and sum
ensures that the match is in different columns.