If this is my data frame:
> length <- rep(11:17, 200)
> mean(length)
[1] 14
> sd(length)
[1] 2.001
How can I take a random subsample from the data frame (length) but having almost the same mean and standard deviation?
You can repeatedly draw from length until you find enough samples that fit your requirements. It is not pretty, but it works.
length <- rep(11:17, 200)
# save mean and sd the subsamples should have
aimed_mean <- mean(length)
aimed_sd <- sd(length)
# set number of replications / iterations
n_replication <- 1000
# set size of sample
size_sample <- 40
# set desired number of samples
n_sample <- 3
# set deviation from mean and sd you can accept
deviation_mean <- 1.5
deviation_sd <- 1.5
# create empty container for resulting samples
samples <- list(n_replication)
# Repeatedly sample from length
i <- 0
sample_count <- 0
repeat {
i <- i+1
# take a sample from length
sample_length <- sample(length, size_sample)
# keep the sample when is is close enough
if(abs(aimed_mean - mean(sample_length)) < deviation_mean &
abs(aimed_sd - sd(sample_length)) < deviation_sd){
samples[[i]] <- sample_length
sample_count <- sample_count + 1
}
if(i == n_replication | sample_count == n_sample){
break
}
}
# your samples
samples
# test whether it worked
lapply(samples, function(x){abs(mean(x)-aimed_mean)<deviation_mean})
lapply(samples, function(x){abs(sd(x)-aimed_sd)<deviation_sd})