Search code examples
rgroup-bytransposesummarize

Group, Summarize and transpose in one step


I have a dataframe that looks like this

      Vehicle Model Month Sales
      A   XXY     1    10
      A   XXY     1   100
      A   XXY     2    40
      A   XXY     3    10
      A   YYX     3    10
      A   YYX     3    33
      B   ZZZ     1    50
      B   ZZY     2    60

I want to be able to transform it as follows:

      Vehicle Model 1 2 3 4 5
       A      XXY  2  1  1  0  0
       A      YYX  0  0  2  0  0
       B      ZZZ  1  0  0  0  0
       B      ZZY  0  1  0  0  0

So essentially I want to - Group by 2 fields( Vehicle, Model) and then Count the number of record in the column "Sales" BY Month, and transpose the data so that Month becomes column and Vehicle/Model my rows. Also some models might not have up to 12 months, but I want to display all the columns 1 to 12, leaving 0 if no data available

The dataframe is quite large. Any recommendation? Thanks


Solution

  • Not sure about the "one step" part of the question. Doing it in several steps doesn't work?

    library(data.table)
    s <- "      Vehicle Model Month Sales
          A   XXY     1    10
          A   XXY     1   100
          A   XXY     2    40
          A   XXY     3    10
          A   YYX     3    10
          A   YYX     3    33
          B   ZZZ     1    50
          B   ZZY     2    60"
    dt <- fread(s)
    
    dt[, sale_count := .N, by = .(Vehicle, Model, Month)]
    dt[, Sales := NULL]
    dt <- unique(dt)
    dcast(dt, Vehicle + Model ~ Month, value.var = "sale_count")
    

    result: there is NA instead of 0 for months not available. You can change it to 0 if you want.

       Vehicle Model  1  2  3
    1:       A   XXY  2  1  1
    2:       A   YYX NA NA  2
    3:       B   ZZY NA  1 NA
    4:       B   ZZZ  1 NA NA