I have the following starting point:
id.s <- c(1,1,2,2,2,3,3,3,3,4,4,4)
test.s <- c("Negative", "Positive", "Positive", "Negative", "Positive",
"Negative", "Negative", "Negative", "Positive", "Negative",
"Negative", "Negative")
Start <- as.data.frame(cbind(id.s,test.s))
And I am trying to get to:
id.f <- c(1,2,3,4)
Number.Of.Visits <- c(2,3,4,3)
Positive.Test <- c("Yes", "Yes", "Yes", "No")
Num.Positive <- c("1", "2", "1", "0")
finish <- as.data.frame(cbind(id.f, Number.Of.Visits, Positive.Test, Num.Positive))
Effectively: (1) IDs can have multiple visits for testing (2) They can test positive/negative any given visit (3) And I need to know for each ID (a) the number of visits, (b) whether there was any positive test, and (c) how many positive tests there were.
I'm sure I am making this more difficult than it should be. I can envision the pseudo-code, but can't translate that into R.
Any help would be ever so much appreciated.
We can do a group by 'id.s', use summarise
to get the number of rows (n()
, then check if there are any
'Positive' %in%
'test.s', as well as get the count of 'Positive' by taking the sum
of logical vector
library(dplyr)
Start %>%
group_by(id.s) %>%
summarise(NumberOfVisits = n(),
Positive.Test = c('No', 'Yes')[1 + ('Positive' %in% test.s)],
Num.Positive = sum(test.s == 'Positive'), .groups = 'drop')
-output
# A tibble: 4 x 4
# id.s NumberOfVisits Positive.Test Num.Positive
# <chr> <int> <chr> <int>
#1 1 2 Yes 1
#2 2 3 Yes 2
#3 3 4 Yes 1
#4 4 3 No 0