I have the following problem:
Creating the data frame as a start
Name <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
Values <- c(0.1, 0.05, 0.03, 0.06, -0.1, -0.3, -0.05, 0.5, 0.12, 0.06, 0.04, 0.15, 0.13, 0.16, -0.12, -0.03, -0.5, 0.05, 0.07, 0.03)
data <- data.frame(Name, Values)
The relevant part:
# extract Values column
Values <- data$Values
# define sizes of subset and number of iterations
n_small <- 5
n_large <- 15
n_iterations <- 10
set.seed(123456)
# Initialize result vector
Averages_small <- NULL
Averages_large <- NULL
# Calculate average of the subset and allocate it to the result vector
for (i in n_iterations) {
Averages_small[i] <- mean(sample(Values, n_small, replace = FALSE))
Averages_large[i] <- mean(sample(Values, n_large, replace = FALSE))
}
Somehow this gives ma 9x NA and a number. What I am doing wrong? and is there a better way than for-loop this through, because above is an example and also no NA values, however, the original data set has 20k rows and it might "contain" missing values.
fyi, to give you a background: the Values are return figures of investments and the question is having a higher number of investments helps diversification.
Thank you very much for your help!
You can use replicate
to get 10 draws of your sample. This returns a matrix with the samples in columns, so the colMeans
of this matrix gives you the vector you are looking for:
set.seed(1) # For reproducibility
vec5 <- colMeans(replicate(10, sample(data$Values, 5)))
vec15 <- colMeans(replicate(10, sample(data$Values, 15)))
vec5
#> [1] -0.014 0.148 0.044 -0.026 0.062 0.020 -0.032 -0.130 0.166 0.040
vec15
#> [1] 0.058000000 0.024666667 0.051333333 0.045333333 0.024000000
#> [6] 0.010666667 0.022666667 -0.010000000 0.003333333 -0.001333333
You can see that the standard deviation of vec5
is indeed larger:
sd(vec5)
#> [1] 0.08711908
sd(vec15)
#> [1] 0.02297406