I'm working with a clinical dataset of ~2500 unique ID's. Some of the ID's correspond to 20+ occurrences. I would like to see the sample type (NP, Throat, etc.) as well as the result of the test "Not Detected" or "Detected", but I would like to see them spread out over multiple columns and for the ID to basically be two rows. The first row all of the sample types for each occurrence and then the second row is the result for each occurrence. I can get the first row no problem, but I haven't been able to figure out how to add the second row on the same ID with the results below the corresponding sample type. Any help would be greatly appreciated!
ID <- c(1,1,2,2,3,3,3,4)
Type<-c("EM","EM","PA","PA","PA","PA","PA","EM")
Specimen_Type <- c("NP", "NP", "Throat", "Throat", "NP", "Throat", "Throat", "NP")
RESULT_VAL <- c("Not Detected", "Detected", "Not Detected", "Detected", "Not Detected", "Not Detected", "Detected", "Not Detected")
RESULT_DATE <- c("6-1-2020", "6-10-2020","6-1-2020", "6-10-2020","6-1-2020", "6-10-2020", "6-20-2020", "6-1-2020")
Data_sum<- data.frame(ID, Type, Specimen_Type, RESULT_VAL, RESULT_DATE)
I would like it to look like
ID Type Occurrence_1 Occurrence_2 Occurrence_3
1 EM NP NP
1 EM Not Detected Detected
2 PA Throat Throat
2 PA Not Detected Detected
3 PA NP Throat Throat
3 PA Not Detected Not Detected Detected
4 EM NP
4 EM Not Detected
We can reshape to 'long' and then to 'wide'
library(dplyr)
library(stringr)
library(tidyr)
library(data.table)
Data_sum %>%
pivot_longer(cols = c(Specimen_Type, RESULT_VAL)) %>%
arrange(ID, Type,
factor(name, levels = c('Specimen_Type', 'RESULT_VAL'))) %>%
mutate(rn = str_c('Occurence_', rowid(ID, Type, name))) %>%
select(-RESULT_DATE) %>%
pivot_wider(names_from = rn, values_from = value) %>%
select(-name)
# A tibble: 8 x 5
# ID Type Occurence_1 Occurence_2 Occurence_3
# <dbl> <chr> <chr> <chr> <chr>
#1 1 EM NP NP <NA>
#2 1 EM Not Detected Detected <NA>
#3 2 PA Throat Throat <NA>
#4 2 PA Not Detected Detected <NA>
#5 3 PA NP Throat Throat
#6 3 PA Not Detected Not Detected Detected
#7 4 EM NP <NA> <NA>
#8 4 EM Not Detected <NA> <NA>