Search code examples
rdataframesum

How to get a total sum based on a certain columns containing the same value


A dataframe has columns code1, code2, code3 etc., with codes stored in them. For each code, there is a column with number of days used days_used1, days_used2, days_used3 etc. I want to get a dataframe including a column for each code which has the days_used summed for each of the values in code.

data.frame(ID = c(1, 2, 3, 4, 5,6),
            code1 = c('D', 'D', 'C', 'A', 'G', "A"),
           quantity1 = c(10, 10, 15, 30, 20, 10),
           dose1 = c(5, 5, 15, 0.5, 2, 10), 
           days_used1 = c(NA, NA, 90, 63, 20, 50),
           code2 = c('A', 'H', 'D', 'G', 'A', "A"),
           quantity2 = c(5, 10, 10, 20, 20, 5),
           dose2 = c(0.1, 15, 15, 7, 5, 12), 
           days_used2 = c(15, NA, NA, 50, 20, 10),
           code3 = c('A', 'H', 'C', 'A', 'D', "D"),
           quantity3 = c(10, 10, 15, 30, 20, 10),
           dose3 = c(5, 5, 15, 0.5, 2, 10), 
           days_used3 = c(15, NA, NA, 50, 20, 10)
 )

So I want a dataframe looking like this:

data.frame(ID = c(1, 2, 3, 4, 5),
                  A= c(30, 0, 0, 20, 60),
                  D = c(0, 0, 0, 0, 20, 10),
                  C= c(0, 0, 90, 0, 20, 10),
                  G= c(0, 0, 0, 50, 20, 0),
                  H= c(0, 0, 0, 0, 0)
)

I don't really know how I can get this done. I tried something along the ways of the example below, but this obviously does not work

ifelse(test[paste0("code",c(1:3))] == "A",
         sum(test[paste0("code",c(1:3)) == "A"]+1), 
         0)
     code1 code2 code3
[1,]     0     0     0
[2,]     0     0     0
[3,]     0     0     0
[4,]     0     0     0
[5,]     0     0     0
[6,]     0     0     0

Solution

  • library(data.table)
    
    dcast(
          melt(setDT(test), id=1L,
               measure=patterns("^code", "^days"),
               value.name=c("Code", "Days"))[,
                .(sumDays = sum(Days, na.rm = T)), 
                .(ID, Code)],
          ID  ~ Code, value.var = c("sumDays"), fill = 0)
    
    #>    ID   A  C  D  G H
    #> 1:  1  30  0  0  0 0
    #> 2:  2   0  0  0  0 0
    #> 3:  3   0 90  0  0 0
    #> 4:  4 113  0  0 50 0
    #> 5:  5  20  0 20 20 0
    #> 6:  6  60  0 10  0 0
    

    Created on 2024-04-23 with reprex v2.0.2