Search code examples
rggplot2usage-statistics

Interpretation of "stat_summary = mean_cl_boot" at ggplot2?


a perhaps simple question I tried to make an errorgraph like the one shown in page 532 of Field's "Discovering Statistics Using R".

The code can be found here http://www.sagepub.com/dsur/study/DSUR%20R%20Script%20Files/Chapter%2012%20DSUR%20GLM3.R :

line <- ggplot(gogglesData, aes(alcohol, attractiveness, colour = gender))
line + stat_summary(fun.y = mean, geom = "point") + 
stat_summary(fun.y = mean, geom = "line", aes(group= gender)) + 
stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) + 
labs(x = "Alcohol Consumption", y = "Mean Attractiveness of Date (%)", colour = "Gender")  

I produced the same graph; my y-axis variable has only 4-points (it is a discrete scale, 1-4), now the y-axis has the points 1.5, 2, 2.5 in which the lines vary.

And the question is: what do these points and graphs describe? I assume that the important part is stat_summary(fun.data = mean_cl_boot, geom = "errorbar", width = 0.2) are they count of observations for that group and that level(x-axis)? Are they frequencies? Or, are they proportions?

I found this http://docs.ggplot2.org/0.9.3/stat_summary.html but it did not help me

Thank you


Solution

  • Here is what the ggplot2 book on page 83 says about mean_cl_boot()

    Function          Hmisc original        Middle Range
    mean_cl_boot() smean.cl.boot() Mean Standard error from bootstrap
    

    I think that it is the smean.cl.boot() from Hmisc package but renamed as mean.cl.boot() in ggplot2.

    and here is the definition of original function from Hmisc package :

    smean.cl.boot is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality