Search code examples
rdata-visualizationt-test

How to plot Two Sample t.test() means, medians, and CI in R?


With the data I have, this R code x <- t.test(Age ~ Completers, var.equal = TRUE, data = data) renders the following result:

    Two Sample t-test

data:  Age by Completers
t = 0.93312, df = 1060, p-value = 0.351
alternative hypothesis: true difference in means between group Completers and group Non Completers is not equal to 0
95 percent confidence interval:
 -0.5844018  1.6442118
sample estimates:
    mean in group Completers mean in group Non Completers
                    37.16052                     36.63062

What I would like is to plot each mean (found in x$estimate[1] and x$estimate[2]) with its own point on the x axis at its proper height on the y axis (on the same graph) and each point complemented with the same confidence interval (CI) (found in x$conf.int[1] and x$conf.int[2]). Like this[*]:

enter image description here

Unfortunately, if I'm not mistaken, plot() (from the Generic X-Y Plotting) does not seem to handle this. So I tried with plotCI (from gplots) as follows:

library(gplots)

plotCI(x = x$estimate[1], y = x$estimate[2], 
       li = x$conf.int[1], ui = x$conf.int[2])

But it renders as shown below:

enter image description here

My questions:

  • Is there a way to obtain a plot such as in the first graph with Base R code?
  • If not, what would be the solution (short of using the jmv:: code (see [*]))?

EDIT

As requested in the comments, please find hereunder some code that help reproduce the data (T-Test results are won't be exactly the same as above, but the idea is the same):

# Generate random numbers with specific mean and standard deviation
completers <- data.frame(Completers = 1, 
                         Age = rnorm(100, mean = 37.16052, sd = 8.34224))
nonCompleters <- data.frame(Completers = 0, 
                            Age = rnorm(100, mean = 36.63062, sd = 11.12173))

# Convert decimaled number to integers
completers[] <- lapply(completers, as.integer)
nonCompleters[] <- lapply(nonCompleters, as.integer)

# Stack data from 2 different data frames
df <- rbind(completers, nonCompleters)

# Remove useless data frames
rm(completers, nonCompleters)

# Age ~ Completers (T-Test)
(tTest <- t.test(df$Age ~ df$Completers, var.equal = TRUE))

Sources:


[*] Graph obtained with Jamovi Version 2.3.15.0 which uses the following code (but I would like to avoid using jmv::):

jmv::ttestIS(
  formula = Age ~ Completers,
  data = data,
  plots = TRUE
)

System used:

  • R 4.2.1
  • RStudio 2022.07.1 Build 554
  • macOS Monterey Version 12.5.1 (Intel)

Solution

  • There appears to be a misalignment of what you want and what t.test() is giving you. t.test() let you know if there is a difference in means, and report the CI of the difference in sample means (not the CIs of the individual means).

    Since you stated you want the CIs of the individual means using base R, you can accomplish this by:

    Sample data

    nn <- 100
    df <- data.frame(Completers = rep(c(1,0), each = nn),
                     Age = c(as.integer(rnorm(nn, mean = 37.16052, sd = 8.34224)),
                             as.integer(rnorm(nn, mean = 36.63062, sd = 11.12173))))
    

    With the raw data, calculate the summary statistics and confidence interval:

    # Base R - find summary statistics and restructure into data frame
    df_summary <- aggregate(Age ~ Completers, df, function(x) c(mean = mean(x), 
                                                                sd = sd(x), 
                                                                median = median(x), 
                                                                n = length(x)))
    df_summary <- data.frame(Completers = df_summary[, 1], df_summary$Age) #reformat nested matrix
    
    # Calculate 95% CI
    alpha <- 0.05/2
    
    # Lower CI
    df_summary$ci_low <-
      df_summary$mean - qt(1 - alpha, df = df_summary$n) * df_summary$sd /
      sqrt(df_summary$n)
    
    # Upper CI
    df_summary$ci_hi <-
      df_summary$mean + qt(1 - alpha, df = df_summary$n) * df_summary$sd /
      sqrt(df_summary$n)
    
    # Output
    
    #  Completers  mean        sd median   n   ci_low    ci_hi
    #1          0 34.94 10.730698     34 100 32.81106 37.06894
    #2          1 37.43  7.645234     37 100 35.91321 38.94679
    
    
    

    Now you can plot the mean and CI for each group (your example also mentioned you wanted the median in there):

    # Set Y limits (change to whatever)
    ylimits <- c(min(df_summary$ci_low) - 1,
                 max(df_summary$ci_hi) + 1)
    
    # Plot
    plot(NA, xlim = c(0,3), ylim = ylimits, # blank plot
         axes = FALSE, xlab = "", ylab = "")
    segments(x0 = c(1,2), y0 = df_summary$ci_low, y1 = df_summary$ci_hi) # add segments
    points(df_summary$mean, pch = 19) # add means
    points(df_summary$median, pch = 0)
    axis(1, at = 0:3, labels = c(NA, "Completers", "Noncompleters", NA)) # add x axis
    axis(2) #add y axis
    mtext(side = 1, "Completers", padj = 4) # add x label 
    mtext(side = 2, "Age", padj = -4) # add y label
    legend("topleft", c("Mean", "Median", "95% CI"),
           pch = c(19, 0, NA), lty = c(NA, NA, 1), bty = "n")
    

    Output: enter image description here