r statistics aggregate confidence-interval group

R calculate confidence interval for mean in aggregate, by function

I am intended to create a script that can generate a statistical information, grouped based on category (which in the example below; SDP & M346). I need the n, mean, standard deviation (sd), standard error (se) and confidence interval. I have the code ready for all except the confidence interval. I could not manage to embed the calculation of the confidence interval in group using aggregate function.

tab.sep.file

I was trying to print the mean/sd/se/confidence interval grouped by the column SDP and M346.

aggregate(tab.sep.file$Hic8, FUN=mean, by=list(SDP=tab.sep.file$SDP, trait=tab.sep.file$M346))

By switching the function to length, standard deviation and creating formula for standard error, I managed to obtain the statistics for grouped data in one go. The output for the script above is something as below:

To calculate the confidence interval for the entire data (not grouping):

mean <- mean(tab.sep.file$Hic8)
n <- length(tab.sep.file$Hic8)
standard_deviation <- sd(tab.sep.file$Hic8)
standard_error <- standard_deviation / sqrt(n)
alpha = 0.05
degrees_of_freedom = n - 1
t_score = qt(p=alpha/2, df=degrees_of_freedom, lower.tail=F)
margin_error <- t_score * standard_error
 
# Calculating lower bound and upper bound
lower_bound <- mean_value - margin_error
upper_bound <- mean_value + margin_error
 
# Print the confidence interval
print(c(lower_bound,upper_bound))

embedded from [https://www.geeksforgeeks.org/how-to-find-confidence-intervals-in-r/][3]

But how should I calculate the confidence interval based on the grouped data using the aggregate function in one go like the rest?

My main problem here is that instead of having one n, in grouped data, there will definitely having more than one n.

I have tried using dplyr tool too, but the output is incorrect, something seemed off.

> tab.sep.file %>%
+ group_by(SDP,M346) %>%
+ summarise(mean = mean(tab.sep.file$Hic8),
+ sd = sd(tab.sep.file$Hic8),
+ n = n()) %>%
+ mutate(se = sd / sqrt(n),
+ lower.ci = mean - qt(1 - (0.05 / 2 ), n -1) * se,
+ upper.ci = mean + qt(1 - (0.05 / 2 ), n -1) * se)

Solution

You need specify grouping in aggregate. e.g. via formula. Try this

> aggf <- \(x) {
+   n <- length(x)
+   xmn <- mean(x)
+   xsd <- sd(x)
+   se <- xsd/sqrt(n)
+   ci <- xmn + se*(-qt(p=.05/2, df=n - 1)) %*% cbind(-1, 1)
+   c(mean=xmn, sd=xsd, se=se, ci=ci)
+ }
> aggregate(Hic8 ~ sdp, dat, aggf)
   sdp   Hic8.mean     Hic8.sd     Hic8.se    Hic8.ci1    Hic8.ci2
1 SDP1  0.64358665  0.20498872  0.08368630  0.42846418  0.85870913
2 SDP2  0.32132755  0.16655345  0.09615968 -0.09241416  0.73506925
3 SDP3  0.46302347  0.37685995  0.21758019 -0.47314854  1.39919549

Data

> dput(dat)
structure(list(sdp = c("SDP1", "SDP1", "SDP1", "SDP2", "SDP2", 
"SDP2", "SDP1", "SDP1", "SDP1", "SDP3", "SDP3", "SDP3"), Hic8 = c(0.854164426913485, 
0.382015778450295, 0.799342160811648, 0.144684031140059, 0.475511683383957, 
0.343786930665374, 0.629821318900213, 0.412683063885197, 0.783493172377348, 
0.534960983321071, 0.798729537520558, 0.0553799034096301)), class = "data.frame", row.names = c(NA, 
-12L))