Search code examples
rdataframedcast

Casting multiple value.var controled by fun.aggregate


I have the following dataset

client_id <- c("A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "B")
value <- c(10, 35, 20, 30, 50, 40, 30, 40, 30, 40, 10)
period_30 <- c(1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0)
period_60 <- c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
sign <- c("D", "D", "D", "D", "C", "C", "C", "D", "D", "D", "D")

data <- data.frame(client_id, value, period_30, period_60, sign)

I can use this code to count the number of different splits per given period with the code below:

library(data.table)
test<- dcast(setDT(data), client_id ~ paste0("period_30", sign), value.var = "period_30", sum)

But I would like to also calculate the value as per the different splits.

The expected outcome would look like this:

client_id       av.value_period_30_sign_D   av.value_period_60_sign_D   av.value_period_30_sign_C   av.value_period_30_sign_D
    A                     34.16667                      NaN                  NaN                                   NaN
    B                     30.00000                    34.16667               NaN                               27.50000

And then, it should be extendable to additional splits, like average value of sign X, of type X in period 1.

I am not sure if the desired output is doable with this approach. But I was looking at the fun.aggregate argument. Perhaps it could be used in combination with multiple value.var arguments?

Update: Joel's code answers the first part of the question.

client_id   sign    period_30   period_60 
    A         D     34.16667    34.16667
    B         D     30.00000    34.16667
    B         C     NaN         27.50000

But how do I transpose the variables and assign the names as per the splits automatically?


Solution

  • another method(would be faster) is using data.table

    Based on the edit made to the question :(hope the code is self explanatory now)

    library(data.table)
    data1 <- setDT(data)[, lapply(.SD, function(x) mean(value[x==1])),
                          .SDcols = period_30:period_60,
                          by = .(client_id, sign)]
    # `dcast` if also from `data.table` package
    dcast(data1, client_id~sign, drop = FALSE, value.var = c("period_30", "period_60"))
    #   client_id period_30_C period_30_D period_60_C period_60_D
    #1:         A          NA    34.16667          NA    34.16667
    #2:         B         NaN    30.00000        27.5    34.16667