Search code examples
rdataframestandard-error

Calculating standard error of the mean from multiple files in a directory in R


I have multiple text files (hundreds of them) in a directory. Each text has dimensions 225 rows and 50 columns (all the same row names and column names). All text files are numbers and I need to generate one data-frame that takes the standard error of the mean of each cell of all of these text files.

There is plenty of code to calculate one master data-frame that has the average in each cell of all text files in a directory but none for calculating one master data frame that just shows standard error of the mean in every cell.

For example, this will bring in all text files, read them, and generates one master data frame that has the average each cell for each text file.

txt <- lapply(list.files(pattern = ".txt"), read.delim)
Z <- Reduce("+", txt) / length(txt)

Which gives one data frame that looks like this:

>head(Z)
      C1   C2  C3 
Row_1 20   22  25
Row_2 14   9   22

But these are averages of all text files combined into one data frame. I would like this to be standard errors of the mean instead, and unfortunately I haven't found posts that can generate this result. There are plenty of posts that take the standard error of columns of one data-frame, just not this many stored in a directory.

I have tried this, but unfort. it does not work:

SE <- Reduce("sd", txt) /sqrt(length(txt)

Any help would be greatly appreciated. Thank-you.


Solution

  • One option would be to unlist, create an array and use one of the custom functions that calculate standard error

    library(plotrix)
    dim1 <- c(dim(txt[[1]]), length(txt))
    apply(array(unlist(txt), dim1), 1:2, std.error)
    #          [,1]      [,2]     [,3]      [,4]
    #[1,] 1.666667 1.2018504 1.452966 1.7638342
    #[2,] 2.081666 1.5275252 1.527525 2.3333333
    #[3,] 2.027588 0.8819171 1.855921 0.8819171
    

    which is also equal to the function OP showed for calculating

    apply(array(unlist(txt), dim1), 1:2,  function(x) sd(x)/sqrt(length(x)))
    #        [,1]      [,2]     [,3]      [,4]
    #[1,] 1.666667 1.2018504 1.452966 1.7638342
    #[2,] 2.081666 1.5275252 1.527525 2.3333333
    #[3,] 2.027588 0.8819171 1.855921 0.8819171
    

    It can also be used to calculate the mean

    Reduce(`+`, txt)/length(txt)
    #        V1       V2       V3       V4
    #1 5.333333 6.333333 5.333333 4.666667
    #2 4.000000 3.000000 4.000000 5.333333
    #3 4.666667 4.666667 6.666667 6.666667
    
    apply(array(unlist(txt), dim1), 1:2, mean)
    #         [,1]     [,2]     [,3]     [,4]
    #[1,] 5.333333 6.333333 5.333333 4.666667
    #[2,] 4.000000 3.000000 4.000000 5.333333
    #[3,] 4.666667 4.666667 6.666667 6.666667
    
    apply(array(unlist(txt), dim1), 2, rowMeans)
    

    data

    set.seed(24)
    txt <- lapply(1:3, function(i) as.data.frame(matrix(sample(1:9, 3 * 4, 
          replace = TRUE), 3, 4)))