With the data I have, this R
code x <- t.test(Age ~ Completers, var.equal = TRUE, data = data)
renders the following result:
Two Sample t-test
data: Age by Completers
t = 0.93312, df = 1060, p-value = 0.351
alternative hypothesis: true difference in means between group Completers and group Non Completers is not equal to 0
95 percent confidence interval:
-0.5844018 1.6442118
sample estimates:
mean in group Completers mean in group Non Completers
37.16052 36.63062
What I would like is to plot each mean (found in x$estimate[1]
and x$estimate[2]
) with its own point on the x
axis at its proper height on the y
axis (on the same graph) and each point complemented with the same confidence interval (CI) (found in x$conf.int[1]
and x$conf.int[2]
). Like this[*]:
Unfortunately, if I'm not mistaken, plot()
(from the Generic X-Y Plotting) does not seem to handle this. So I tried with plotCI
(from gplots) as follows:
library(gplots)
plotCI(x = x$estimate[1], y = x$estimate[2],
li = x$conf.int[1], ui = x$conf.int[2])
But it renders as shown below:
My questions:
jmv::
code (see [*]))?EDIT
As requested in the comments, please find hereunder some code that help reproduce the data (T-Test results are won't be exactly the same as above, but the idea is the same):
# Generate random numbers with specific mean and standard deviation
completers <- data.frame(Completers = 1,
Age = rnorm(100, mean = 37.16052, sd = 8.34224))
nonCompleters <- data.frame(Completers = 0,
Age = rnorm(100, mean = 36.63062, sd = 11.12173))
# Convert decimaled number to integers
completers[] <- lapply(completers, as.integer)
nonCompleters[] <- lapply(nonCompleters, as.integer)
# Stack data from 2 different data frames
df <- rbind(completers, nonCompleters)
# Remove useless data frames
rm(completers, nonCompleters)
# Age ~ Completers (T-Test)
(tTest <- t.test(df$Age ~ df$Completers, var.equal = TRUE))
Sources:
[*] Graph obtained with Jamovi Version 2.3.15.0 which uses the following code (but I would like to avoid using jmv::
):
jmv::ttestIS(
formula = Age ~ Completers,
data = data,
plots = TRUE
)
System used:
There appears to be a misalignment of what you want and what t.test()
is giving you. t.test()
let you know if there is a difference in means, and report the CI of the difference in sample means (not the CIs of the individual means).
Since you stated you want the CIs of the individual means using base R, you can accomplish this by:
Sample data
nn <- 100
df <- data.frame(Completers = rep(c(1,0), each = nn),
Age = c(as.integer(rnorm(nn, mean = 37.16052, sd = 8.34224)),
as.integer(rnorm(nn, mean = 36.63062, sd = 11.12173))))
With the raw data, calculate the summary statistics and confidence interval:
# Base R - find summary statistics and restructure into data frame
df_summary <- aggregate(Age ~ Completers, df, function(x) c(mean = mean(x),
sd = sd(x),
median = median(x),
n = length(x)))
df_summary <- data.frame(Completers = df_summary[, 1], df_summary$Age) #reformat nested matrix
# Calculate 95% CI
alpha <- 0.05/2
# Lower CI
df_summary$ci_low <-
df_summary$mean - qt(1 - alpha, df = df_summary$n) * df_summary$sd /
sqrt(df_summary$n)
# Upper CI
df_summary$ci_hi <-
df_summary$mean + qt(1 - alpha, df = df_summary$n) * df_summary$sd /
sqrt(df_summary$n)
# Output
# Completers mean sd median n ci_low ci_hi
#1 0 34.94 10.730698 34 100 32.81106 37.06894
#2 1 37.43 7.645234 37 100 35.91321 38.94679
Now you can plot the mean and CI for each group (your example also mentioned you wanted the median in there):
# Set Y limits (change to whatever)
ylimits <- c(min(df_summary$ci_low) - 1,
max(df_summary$ci_hi) + 1)
# Plot
plot(NA, xlim = c(0,3), ylim = ylimits, # blank plot
axes = FALSE, xlab = "", ylab = "")
segments(x0 = c(1,2), y0 = df_summary$ci_low, y1 = df_summary$ci_hi) # add segments
points(df_summary$mean, pch = 19) # add means
points(df_summary$median, pch = 0)
axis(1, at = 0:3, labels = c(NA, "Completers", "Noncompleters", NA)) # add x axis
axis(2) #add y axis
mtext(side = 1, "Completers", padj = 4) # add x label
mtext(side = 2, "Age", padj = -4) # add y label
legend("topleft", c("Mean", "Median", "95% CI"),
pch = c(19, 0, NA), lty = c(NA, NA, 1), bty = "n")