Search code examples
rdplyraggregatemeangroup-summaries

Aggregate by multiple columns and reshape from long to wide


There are some questions similar to this topic on SO but not exactly like my usecase. I have a dataset where the columns are laid out as shown below

     Id        Description          Value
     10        Cat                  19
     10        Cat                  20
     10        Cat                  5
     10        Cat                  13
     11        Cat                  17
     11        Cat                  23
     11        Cat                  7
     11        Cat                  14  
     10        Dog                  19
     10        Dog                  20
     10        Dog                  5
     10        Dog                  13
     11        Dog                  17
     11        Dog                  23
     11        Dog                  7
     11        Dog                  14    

What I am trying to do is capture the mean of the Value column by Id, Description. The final dataset would look like this.

     Id       Cat         Dog 
     10       14.25       28.5
     11       15.25       15.25

I can do this in a very rough manner not very efficient like this

tempdf1 <- df %>%
  filter(str_detect(Description, "Cat")) %>%
   group_by(Id, Description) %>%
  summarize(Mean_Value = mean(Value) , na.rm = TRUE))

This is not very convenient. Any advise on how how to accomplish the expected results more efficiently is much appreciated.


Solution

  • Use dcast or even acast from reshape2() package

    dcast(dat,Id~Description,mean)
       Id   Cat   Dog
     1 10 14.25 14.25
     2 11 15.25 15.25
    

    Base R might be abit longer:

     reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names  = "Value",idvar = "Id",timevar = "Description")
      Id Value.Cat Value.Dog
    1 10     14.25     14.25
    2 11     15.25     15.25