Search code examples
raggregatesummarize

Multiple summary counts and a flag across variable number of rows


I have the following starting point:

id.s <- c(1,1,2,2,2,3,3,3,3,4,4,4)
test.s <- c("Negative", "Positive", "Positive", "Negative", "Positive",
        "Negative", "Negative", "Negative", "Positive", "Negative",
        "Negative", "Negative")
Start <- as.data.frame(cbind(id.s,test.s))

And I am trying to get to:

id.f <- c(1,2,3,4)
Number.Of.Visits <- c(2,3,4,3)
Positive.Test <- c("Yes", "Yes", "Yes", "No")
Num.Positive <- c("1", "2", "1", "0")
finish <- as.data.frame(cbind(id.f, Number.Of.Visits, Positive.Test, Num.Positive))

Effectively: (1) IDs can have multiple visits for testing (2) They can test positive/negative any given visit (3) And I need to know for each ID (a) the number of visits, (b) whether there was any positive test, and (c) how many positive tests there were.

I'm sure I am making this more difficult than it should be. I can envision the pseudo-code, but can't translate that into R.

Any help would be ever so much appreciated.


Solution

  • We can do a group by 'id.s', use summarise to get the number of rows (n(), then check if there are any 'Positive' %in% 'test.s', as well as get the count of 'Positive' by taking the sum of logical vector

    library(dplyr)
    Start %>%
       group_by(id.s) %>%
       summarise(NumberOfVisits = n(),
                 Positive.Test = c('No', 'Yes')[1 + ('Positive' %in% test.s)], 
                Num.Positive = sum(test.s == 'Positive'), .groups = 'drop')
    

    -output

    # A tibble: 4 x 4
    #  id.s  NumberOfVisits Positive.Test Num.Positive
    #  <chr>          <int> <chr>                <int>
    #1 1                  2 Yes                      1
    #2 2                  3 Yes                      2
    #3 3                  4 Yes                      1
    #4 4                  3 No                       0