A dataframe has columns code1
, code2
, code3
etc., with codes stored in them. For each code, there is a column with number of days used days_used1
, days_used2
, days_used3
etc. I want to get a dataframe including a column for each code which has the days_used summed for each of the values in code
.
data.frame(ID = c(1, 2, 3, 4, 5,6),
code1 = c('D', 'D', 'C', 'A', 'G', "A"),
quantity1 = c(10, 10, 15, 30, 20, 10),
dose1 = c(5, 5, 15, 0.5, 2, 10),
days_used1 = c(NA, NA, 90, 63, 20, 50),
code2 = c('A', 'H', 'D', 'G', 'A', "A"),
quantity2 = c(5, 10, 10, 20, 20, 5),
dose2 = c(0.1, 15, 15, 7, 5, 12),
days_used2 = c(15, NA, NA, 50, 20, 10),
code3 = c('A', 'H', 'C', 'A', 'D', "D"),
quantity3 = c(10, 10, 15, 30, 20, 10),
dose3 = c(5, 5, 15, 0.5, 2, 10),
days_used3 = c(15, NA, NA, 50, 20, 10)
)
So I want a dataframe looking like this:
data.frame(ID = c(1, 2, 3, 4, 5),
A= c(30, 0, 0, 20, 60),
D = c(0, 0, 0, 0, 20, 10),
C= c(0, 0, 90, 0, 20, 10),
G= c(0, 0, 0, 50, 20, 0),
H= c(0, 0, 0, 0, 0)
)
I don't really know how I can get this done. I tried something along the ways of the example below, but this obviously does not work
ifelse(test[paste0("code",c(1:3))] == "A",
sum(test[paste0("code",c(1:3)) == "A"]+1),
0)
code1 code2 code3
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
[6,] 0 0 0
library(data.table)
dcast(
melt(setDT(test), id=1L,
measure=patterns("^code", "^days"),
value.name=c("Code", "Days"))[,
.(sumDays = sum(Days, na.rm = T)),
.(ID, Code)],
ID ~ Code, value.var = c("sumDays"), fill = 0)
#> ID A C D G H
#> 1: 1 30 0 0 0 0
#> 2: 2 0 0 0 0 0
#> 3: 3 0 90 0 0 0
#> 4: 4 113 0 0 50 0
#> 5: 5 20 0 20 20 0
#> 6: 6 60 0 10 0 0
Created on 2024-04-23 with reprex v2.0.2