Search code examples
rgroup-summaries

How to create summaries of subgroups based on factors in R


I want to calculate the mean for each numeric variable in the following example. These need to be grouped by each factor associated with "id" and by each factor associated with"status".

set.seed(10)
dfex <- 
data.frame(id=c("2","1","1","1","3","2","3"),status=c("hit","miss","miss","hit","miss","miss","miss"),var3=rnorm(7),var4=rnorm(7),var5=rnorm(7),var6=rnorm(7))

For the means of "id" groups, the first row of output would be labeled "mean-id-1". Rows labeled "mean-id-2" and "mean-id-3" would follow. For the means of "status" groups, the rows would be labeled "mean-status-miss" and "mean-status-hit". My objective is to generate these means and their row labels programatically.

I've tried many different permutations of apply functions, but each has issues. I've also experimented with the aggregate function.


Solution

  • With base R the following works for the "id" column:

    means_id <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$id),mean)
    rownames(means_id) <- paste0("mean-id-",means_id$Group.1)
    means_id$Group.1 <- NULL
    

    Output:

                    var3       var4       var5       var6
    mean-id-1 -0.7182503 -0.2604572 -0.3535823 -1.3530417
    mean-id-2  0.2042702 -0.3009548  0.6121843 -1.4364211
    mean-id-3 -0.4567655  0.8716131  0.1646053 -0.6229102
    

    The same for the "status" column:

    means_status <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$status),mean)
    rownames(means_status) <- paste0("mean-status-",means_status$Group.1)
    means_status$Group.1 <- NULL