Assuming I have the following data:
df1<-rnorm(100,000, 20,5)
I want to get the following samples from df1 with 50 trials each:
C=( 25,50,100,200,300,400,500,600)
Next, I want to plot a trend line. In the trend line plot, the x-axis= Sample size and the y-axis is SD and SEM. Sorry, I was unable to draw the plot, but hopefully, my description is clear. Thanks for your help.
I am unsure what you're attempting to do here but this is a first pass at it with what I think you want:
library(ggplot2)
my_theme = theme_minimal() + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), legend.title = element_blank())
df1 <- rnorm(100000, 20, 5)
df <- data.frame(sample_size = c(25,50,100,200,300,400,500,600))
samples <- lapply(df$sample_size, function (x) {sample(df1, x)})
df$std <- sapply(samples, sd)
df$se <- sapply(samples, function(x) {sd(x)/sqrt(length(x))})
ggplot(data = df) + geom_point(aes(x = sample_size, y = std, colour = "std")) + geom_point(aes(x = sample_size, y = se, colour = "se")) +
geom_smooth(aes(x = sample_size, y = std), method = "lm",) +
geom_smooth(aes(x = sample_size, y = se), method = "lm") + my_theme
I prefer to use the ggplot2 library for plots rather than what comes with base R. You can ignore the my_theme
part, that's just the aesthetic I prefer. Here's the plot:
If this isn't what you're looking for, you should be able to modify what's here to get what you want, unless I've completely misunderstood your question. In any case, the important part is to use lapply
and sample
to get a list of samples from df1
. Then, you can just calculate the standard deviation of each using sapply
and sd
and the standard error using sapply
, sd
, sqrt
, and length
.
Both sapply
and lapply
are useful especially since user defined functions can be passed as arguments in place. I'd recommend looking at lapply to understand what it's doing (sapply
is simply a wrapper to lapply
that returns a vector).