I have this data:
drugData <- data.frame(caseID=c(9, 9, 10, 11, 12, 12, 12, 12, 13, 45, 45, 225),
Drug=c("Cocaine", "Cocaine", "DPT", "LSD", "Cocaine", "LSD", "Heroin","Heroin", "LSD", "DPT", "DPT", "Heroin"),
County=c("A", "A", "B", "C", "D", "D", "D","D", "E", "F", "F", "G"),
Date=c(2009, 2009, 2009, 2009, 2011, 2011, 2011, 2011, 2010, 2010, 2010, 2005))
"CaseID" rows make up a single case, which may have observations of all the same drug, or different types of drugs. I want this data to look like the following:
CaseID Drug.1 Drug.2 Drug. 3 Drug.4 County Date
9 Cocaine Cocaine NA NA A 2009
10 DPT LSD NA NA B 2009
11 LSD NA NA NA C 2009
12 Cocaine LSD Heroin Heroin D 2011
13 LSD NA NA NA E 2010
45 DPT DPT NA NA F 2010
225 Heroin NA NA NA G 2005
I've tried using dplyr spread function but can't seem to quite get this to work.
We can pivot to wide format after creating a sequence column based on 'caseID'
library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
drugData %>%
mutate(nm = str_c('Drug', rowid(caseID))) %>%
pivot_wider(names_from = nm, values_from = Drug)
#A tibble: 7 x 7
# caseID County Date Drug1 Drug2 Drug3 Drug4
# <dbl> <fct> <dbl> <fct> <fct> <fct> <fct>
#1 9 A 2009 Cocaine Cocaine <NA> <NA>
#2 10 B 2009 DPT <NA> <NA> <NA>
#3 11 C 2009 LSD <NA> <NA> <NA>
#4 12 D 2011 Cocaine LSD Heroin Heroin
#5 13 E 2010 LSD <NA> <NA> <NA>
#6 45 F 2010 DPT DPT <NA> <NA>
#7 225 G 2005 Heroin <NA> <NA> <NA>
Or with spread
(spread
is deprecated in place of pivot_wider
drugData %>%
mutate(nm = str_c('Drug', rowid(caseID))) %>%
spread(nm, Drug)
Or using data.table
dcast(setDT(drugData), caseID + County + Date ~
paste0('Drug', rowid(caseID)), value.var = 'Drug')
# caseID County Date Drug1 Drug2 Drug3 Drug4
#1: 9 A 2009 Cocaine Cocaine <NA> <NA>
#2: 10 B 2009 DPT <NA> <NA> <NA>
#3: 11 C 2009 LSD <NA> <NA> <NA>
#4: 12 D 2011 Cocaine LSD Heroin Heroin
#5: 13 E 2010 LSD <NA> <NA> <NA>
#6: 45 F 2010 DPT DPT <NA> <NA>
#7: 225 G 2005 Heroin <NA> <NA> <NA>