I was trying to simulate a sequence of measures of heights and weights of horses by breed, sampling the data from a list of breeds of horses available in the UK. I ended up using a very long code made of lots of copy-paste and I think there must be a more efficient way to do this, but I tried with the apply
functions, but I was unsuccessful.
The MWE is the following
library(tidyverse)
horses_meas <- tribble(
~breed, ~mean_h, ~sd_h, ~mean_w, ~sd_w,
"Hannover", 1.69, 0.10, 600, 25,
"Arabian", 1.50, 0.05, 400, 45,
"Holsteiner", 1.68, 0.05, 575, 125
)
# Simulation by breed
type_1 <- bind_cols(breed = rep(horses_meas$breed[1], 20),
weight = rnorm(20, horses_meas$mean_w[1], horses_meas$sd_w[1]),
height = rnorm(20, horses_meas$mean_h[1], horses_meas$sd_h[1]))
type_2 <- bind_cols(breed = rep(horses_meas$breed[2], 20),
weight = rnorm(20, horses_meas$mean_w[2], horses_meas$sd_w[2]),
height = rnorm(20, horses_meas$mean_h[2], horses_meas$sd_h[2]))
type_3 <- bind_cols(breed = rep(horses_meas$breed[3], 20),
weight = rnorm(20, horses_meas$mean_w[3], horses_meas$sd_w[3]),
height = rnorm(20, horses_meas$mean_h[3], horses_meas$sd_h[3]))
horses <- bind_rows(type_1, type_2, type_3)
print(horses)
Can you please help me transforming this into a more effective piece of code? I would like to be able to select a different number of horses (the ones in the MWE are just the outcome of a random sampling, but ideally this number may vary), and I don't want every time to cut and paste all the type_i
assignments.
I reckon that something like assigning a temporary variable x <- NULL
and then creating a list x['breed']
to fill in with for
could help, but I would like to do this with the apply
functions or using the tidyverse
instead.
Thank you for any help you may give me.
One tidyverse
approach would be to use purrr::pmap
to loop over the rows of horses_meas
like so:
library(purrr)
library(dplyr)
set.seed(123)
purrr::pmap(horses_meas,
function(breed, mean_w, sd_w, mean_h, sd_h, n) {
data.frame(
breed = breed,
weight = rnorm(n, mean_w, sd_w),
height = rnorm(n, mean_h, sd_h)
)
},
n = 5
) |>
bind_rows()
#> breed weight height
#> 1 Hannover 585.9881 1.861506
#> 2 Hannover 594.2456 1.736092
#> 3 Hannover 638.9677 1.563494
#> 4 Hannover 601.7627 1.621315
#> 5 Hannover 603.2322 1.645434
#> 6 Arabian 455.0837 1.589346
#> 7 Arabian 416.1916 1.524893
#> 8 Arabian 418.0347 1.401669
#> 9 Arabian 404.9807 1.535068
#> 10 Arabian 374.9871 1.476360
#> 11 Holsteiner 441.5220 1.595665
#> 12 Holsteiner 547.7531 1.721889
#> 13 Holsteiner 446.7494 1.687669
#> 14 Holsteiner 483.8886 1.623093
#> 15 Holsteiner 496.8701 1.742691