I have multiple text files (hundreds of them) in a directory. Each text has dimensions 225 rows and 50 columns (all the same row names and column names). All text files are numbers and I need to generate one data-frame that takes the standard error of the mean of each cell of all of these text files.
There is plenty of code to calculate one master data-frame that has the average in each cell of all text files in a directory but none for calculating one master data frame that just shows standard error of the mean in every cell.
For example, this will bring in all text files, read them, and generates one master data frame that has the average each cell for each text file.
txt <- lapply(list.files(pattern = ".txt"), read.delim)
Z <- Reduce("+", txt) / length(txt)
Which gives one data frame that looks like this:
>head(Z)
C1 C2 C3
Row_1 20 22 25
Row_2 14 9 22
But these are averages of all text files combined into one data frame. I would like this to be standard errors of the mean instead, and unfortunately I haven't found posts that can generate this result. There are plenty of posts that take the standard error of columns of one data-frame, just not this many stored in a directory.
I have tried this, but unfort. it does not work:
SE <- Reduce("sd", txt) /sqrt(length(txt)
Any help would be greatly appreciated. Thank-you.
One option would be to unlist
, create an array
and use one of the custom functions that calculate standard error
library(plotrix)
dim1 <- c(dim(txt[[1]]), length(txt))
apply(array(unlist(txt), dim1), 1:2, std.error)
# [,1] [,2] [,3] [,4]
#[1,] 1.666667 1.2018504 1.452966 1.7638342
#[2,] 2.081666 1.5275252 1.527525 2.3333333
#[3,] 2.027588 0.8819171 1.855921 0.8819171
which is also equal to the function OP showed for calculating
apply(array(unlist(txt), dim1), 1:2, function(x) sd(x)/sqrt(length(x)))
# [,1] [,2] [,3] [,4]
#[1,] 1.666667 1.2018504 1.452966 1.7638342
#[2,] 2.081666 1.5275252 1.527525 2.3333333
#[3,] 2.027588 0.8819171 1.855921 0.8819171
It can also be used to calculate the mean
Reduce(`+`, txt)/length(txt)
# V1 V2 V3 V4
#1 5.333333 6.333333 5.333333 4.666667
#2 4.000000 3.000000 4.000000 5.333333
#3 4.666667 4.666667 6.666667 6.666667
apply(array(unlist(txt), dim1), 1:2, mean)
# [,1] [,2] [,3] [,4]
#[1,] 5.333333 6.333333 5.333333 4.666667
#[2,] 4.000000 3.000000 4.000000 5.333333
#[3,] 4.666667 4.666667 6.666667 6.666667
apply(array(unlist(txt), dim1), 2, rowMeans)
set.seed(24)
txt <- lapply(1:3, function(i) as.data.frame(matrix(sample(1:9, 3 * 4,
replace = TRUE), 3, 4)))