Search code examples
rdplyrcountunique

How can I obtain the number of unique ids that fits into two criteria non simultaneously (e.g. in different rows)?


I have a dataframe with multiple rows per individual (id). I want to obtain the count of unique individuals who has ever recorded 1 in A and B, but not necessarily on the same day (A and B can receive 1 in different rows, e.g. date).

id <- as.character(c(108, 108, 111, 111, 111, 111, 153, 153, 153, 153, 153, 153))
date <- as.POSIXct(c("2014-03-12 08:44:18 UTC", "2015-09-16 02:56:00 UTC",  
"2015-10-24 08:09:11 UTC", "2016-12-11 17:17:00 UTC", "2017-08-06 18:26:00 UTC", 
"2018-01-29 00:00:00 UTC", "2014-04-17 08:40:10 UTC", "2015-09-16 02:56:00 UTC", 
"2015-11-12 13:15:00 UTC", "2016-12-16 17:10:09 UTC", "2017-08-10 04:11:00 UTC", 
"2018-01-29 00:00:00 UTC"))
A <- c(1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1)
B <- c(0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0)
df <- data.frame(id, date, A, B)  
df

    id                date A B
1  108 2014-03-12 08:44:18 1 0
2  108 2015-09-16 02:56:00 1 0
3  111 2015-10-24 08:09:11 1 0
4  111 2016-12-11 17:17:00 1 0
5  111 2017-08-06 18:26:00 1 0
6  111 2018-01-29 00:00:00 0 1
7  153 2014-04-17 08:40:10 0 1
8  153 2015-09-16 02:56:00 0 1
9  153 2015-11-12 13:15:00 1 0
10 153 2016-12-16 17:10:09 1 0
11 153 2017-08-10 04:11:00 1 0
12 153 2018-01-29 00:00:00 1 0

I am using the function unique:

> length(unique(df$patid[which(df$A== 1 & df$B==1)]))
[1] 0

But I do not get what I expect, once two individuals get 1 and A and B in different moments:

[1] 2

How could I get the right count for what I want? Thanks


Solution

  • A slightly different dplyr approach which returns an integer. This makes use of any, which returns TRUE if any of the dates match the criteria.

    df %>% 
      group_by(id) %>%
      summarize(AB = any(A == 1) & any(B == 1)) %>%
      pull(AB) %>%
      sum()
    
    #----
    [1] 2