Search code examples
rrandomsampling

Multiple samplings and calculating Standard deviation and standard error of the Mean using a trend line


Assuming I have the following data:

df1<-rnorm(100,000, 20,5)

I want to get the following samples from df1 with 50 trials each:

C=( 25,50,100,200,300,400,500,600)

Next, I want to plot a trend line. In the trend line plot, the x-axis= Sample size and the y-axis is SD and SEM. Sorry, I was unable to draw the plot, but hopefully, my description is clear. Thanks for your help.


Solution

  • I am unsure what you're attempting to do here but this is a first pass at it with what I think you want:

    library(ggplot2)
    
    my_theme = theme_minimal() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), legend.title = element_blank())
    
    df1 <- rnorm(100000, 20, 5)
    
    df <- data.frame(sample_size = c(25,50,100,200,300,400,500,600))
    
    samples <- lapply(df$sample_size, function (x) {sample(df1, x)})
    
    df$std <- sapply(samples, sd)
    df$se <- sapply(samples, function(x) {sd(x)/sqrt(length(x))})
    
    
    ggplot(data = df) + geom_point(aes(x = sample_size, y = std, colour = "std")) + geom_point(aes(x = sample_size, y = se, colour = "se")) + 
        geom_smooth(aes(x = sample_size, y = std), method = "lm",) +
        geom_smooth(aes(x = sample_size, y = se), method = "lm") + my_theme
    

    I prefer to use the ggplot2 library for plots rather than what comes with base R. You can ignore the my_theme part, that's just the aesthetic I prefer. Here's the plot:

    enter image description here

    If this isn't what you're looking for, you should be able to modify what's here to get what you want, unless I've completely misunderstood your question. In any case, the important part is to use lapply and sample to get a list of samples from df1. Then, you can just calculate the standard deviation of each using sapply and sd and the standard error using sapply, sd, sqrt, and length.

    Both sapply and lapply are useful especially since user defined functions can be passed as arguments in place. I'd recommend looking at lapply to understand what it's doing (sapply is simply a wrapper to lapply that returns a vector).