How can I take a subsample having almost the same mean and standard deviation of the population?

If this is my data frame:

> length <- rep(11:17, 200)
> mean(length)
[1] 14
> sd(length)
[1] 2.001

How can I take a random subsample from the data frame (length) but having almost the same mean and standard deviation?

Solution

You can repeatedly draw from length until you find enough samples that fit your requirements. It is not pretty, but it works.

length <- rep(11:17, 200)

# save mean and sd the subsamples should have
aimed_mean <- mean(length)
aimed_sd <- sd(length)

# set number of replications / iterations
n_replication <- 1000

# set size of sample
size_sample <- 40

# set desired number of samples
n_sample <- 3

# set deviation from mean and sd you can accept
deviation_mean <- 1.5
deviation_sd <- 1.5

# create empty container for resulting samples
samples <- list(n_replication)

# Repeatedly sample from length
i <- 0
sample_count <- 0

repeat {
  
  i <- i+1
  
  # take a sample from length
  sample_length <- sample(length, size_sample)
  
  # keep the sample when is is close enough
  if(abs(aimed_mean - mean(sample_length)) < deviation_mean &
  abs(aimed_sd - sd(sample_length)) < deviation_sd){
    
    samples[[i]] <- sample_length
    sample_count <- sample_count + 1
    
  }
  
  if(i == n_replication | sample_count == n_sample){
    break
  }
  
}

# your samples
samples

# test whether it worked
lapply(samples, function(x){abs(mean(x)-aimed_mean)<deviation_mean})
lapply(samples, function(x){abs(sd(x)-aimed_sd)<deviation_sd})