Search code examples
raggregateheadnames

R - aggregate results in disparity between names(data) and columns in head(data)


I have a dataset of long format with 3 factors (strain, genotype, region) and 1 value (volume). This dataset is called individualData. Basically what I'm trying to do is calculate the mean and standard deviation of volume for every combination of strain * genotype * region, with the exception of those combinations without any data (since genotype labels depend on the strain). It seems like I've been able to do this with the following command, since it produces the expected number of rows:

  summaryData = aggregate( .~strain:genotype:region, individualData, FUN = function(x) c(mn=mean(x), stdev=sd(x)))

The problem is that head(summaryData) gives me 5 columns (volume is replaced with volume.mn and volume.stdev), as I would have expected, but names(summaryData) or colnames(summaryData) gives me only 4 columns -- namely, my original columns. How do I refer to the columns properly? I just want to collapse this into a data.frame that I understand how to work with. Anyone with more experience with the aggregate function know how to do this?

Thanks!


Solution

  • First, here's some reproducible sample data which i'm assuming matches your structure

    set.seed(15)
    individualData <- data.frame(
        volume = runif(120),
        expand.grid(region=1:2, genotype=1:3, strain=1:2)
    )
    

    Then you're running

    summaryData = aggregate( .~strain:genotype:region, individualData, 
        FUN = function(x) c(mn=mean(x), stdev=sd(x)))
    

    and if you look at the structure of what's returned, you get

    str(summaryData)
    # 'data.frame':   12 obs. of  4 variables:
    #  $ strain  : int  1 2 1 2 1 2 1 2 1 2 ...
    #  $ genotype: int  1 1 2 2 3 3 1 1 2 2 ...
    #  $ region  : int  1 1 1 1 1 1 2 2 2 2 ...
    #  $ volume  : num [1:12, 1:2] 0.526 0.409 0.407 0.445 0.566 ...
    #   ..- attr(*, "dimnames")=List of 2
    #   .. ..$ : NULL
    #   .. ..$ : chr  "mn" "stdev"
    

    so aggregate has actually stuffed a matrix into the volume column. You can index these values with

    summaryData$volume[,"mn"]
    summaryData$volume[,"stdev"]
    

    or turn it into a proper data.frame with

    summaryData <- do.call(data.frame, summaryData)
    summaryData$volume.mn
    summaryData$volume.stdev