I have a dataframe that looks like this
Vehicle Model Month Sales
A XXY 1 10
A XXY 1 100
A XXY 2 40
A XXY 3 10
A YYX 3 10
A YYX 3 33
B ZZZ 1 50
B ZZY 2 60
I want to be able to transform it as follows:
Vehicle Model 1 2 3 4 5
A XXY 2 1 1 0 0
A YYX 0 0 2 0 0
B ZZZ 1 0 0 0 0
B ZZY 0 1 0 0 0
So essentially I want to - Group by 2 fields( Vehicle, Model) and then Count the number of record in the column "Sales" BY Month, and transpose the data so that Month becomes column and Vehicle/Model my rows. Also some models might not have up to 12 months, but I want to display all the columns 1 to 12, leaving 0 if no data available
The dataframe is quite large. Any recommendation? Thanks
Not sure about the "one step" part of the question. Doing it in several steps doesn't work?
library(data.table)
s <- " Vehicle Model Month Sales
A XXY 1 10
A XXY 1 100
A XXY 2 40
A XXY 3 10
A YYX 3 10
A YYX 3 33
B ZZZ 1 50
B ZZY 2 60"
dt <- fread(s)
dt[, sale_count := .N, by = .(Vehicle, Model, Month)]
dt[, Sales := NULL]
dt <- unique(dt)
dcast(dt, Vehicle + Model ~ Month, value.var = "sale_count")
result: there is NA instead of 0 for months not available. You can change it to 0 if you want.
Vehicle Model 1 2 3
1: A XXY 2 1 1
2: A YYX NA NA 2
3: B ZZY NA 1 NA
4: B ZZZ 1 NA NA