I have a dataframe
with multiple rows
per individual (id
). I want to obtain the count
of unique
individuals who has ever recorded 1
in A
and B
, but not necessarily on the same day (A
and B
can receive 1
in different rows
, e.g. date).
id <- as.character(c(108, 108, 111, 111, 111, 111, 153, 153, 153, 153, 153, 153))
date <- as.POSIXct(c("2014-03-12 08:44:18 UTC", "2015-09-16 02:56:00 UTC",
"2015-10-24 08:09:11 UTC", "2016-12-11 17:17:00 UTC", "2017-08-06 18:26:00 UTC",
"2018-01-29 00:00:00 UTC", "2014-04-17 08:40:10 UTC", "2015-09-16 02:56:00 UTC",
"2015-11-12 13:15:00 UTC", "2016-12-16 17:10:09 UTC", "2017-08-10 04:11:00 UTC",
"2018-01-29 00:00:00 UTC"))
A <- c(1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1)
B <- c(0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0)
df <- data.frame(id, date, A, B)
df
id date A B
1 108 2014-03-12 08:44:18 1 0
2 108 2015-09-16 02:56:00 1 0
3 111 2015-10-24 08:09:11 1 0
4 111 2016-12-11 17:17:00 1 0
5 111 2017-08-06 18:26:00 1 0
6 111 2018-01-29 00:00:00 0 1
7 153 2014-04-17 08:40:10 0 1
8 153 2015-09-16 02:56:00 0 1
9 153 2015-11-12 13:15:00 1 0
10 153 2016-12-16 17:10:09 1 0
11 153 2017-08-10 04:11:00 1 0
12 153 2018-01-29 00:00:00 1 0
I am using the function unique
:
> length(unique(df$patid[which(df$A== 1 & df$B==1)]))
[1] 0
But I do not get what I expect, once two individuals get 1
and A
and B
in different moments:
[1] 2
How could I get the right count for what I want? Thanks
A slightly different dplyr
approach which returns an integer. This makes use of any
, which returns TRUE
if any of the dates match the criteria.
df %>%
group_by(id) %>%
summarize(AB = any(A == 1) & any(B == 1)) %>%
pull(AB) %>%
sum()
#----
[1] 2