Search code examples
rloopsapply

Rewrite a sequence of assignments with the apply functions in R


I was trying to simulate a sequence of measures of heights and weights of horses by breed, sampling the data from a list of breeds of horses available in the UK. I ended up using a very long code made of lots of copy-paste and I think there must be a more efficient way to do this, but I tried with the apply functions, but I was unsuccessful.

The MWE is the following

library(tidyverse)

horses_meas <- tribble(
    ~breed, ~mean_h, ~sd_h, ~mean_w, ~sd_w,
    "Hannover", 1.69, 0.10, 600, 25,
    "Arabian", 1.50, 0.05, 400, 45,
    "Holsteiner", 1.68, 0.05, 575, 125
)

# Simulation by breed
type_1 <- bind_cols(breed = rep(horses_meas$breed[1], 20), 
                    weight = rnorm(20, horses_meas$mean_w[1], horses_meas$sd_w[1]),
                    height = rnorm(20, horses_meas$mean_h[1], horses_meas$sd_h[1]))
type_2 <- bind_cols(breed = rep(horses_meas$breed[2], 20), 
                    weight = rnorm(20, horses_meas$mean_w[2], horses_meas$sd_w[2]),
                    height = rnorm(20, horses_meas$mean_h[2], horses_meas$sd_h[2]))
type_3 <- bind_cols(breed = rep(horses_meas$breed[3], 20), 
                    weight = rnorm(20, horses_meas$mean_w[3], horses_meas$sd_w[3]),
                    height = rnorm(20, horses_meas$mean_h[3], horses_meas$sd_h[3]))

horses <- bind_rows(type_1, type_2, type_3)

print(horses)

Can you please help me transforming this into a more effective piece of code? I would like to be able to select a different number of horses (the ones in the MWE are just the outcome of a random sampling, but ideally this number may vary), and I don't want every time to cut and paste all the type_i assignments.

I reckon that something like assigning a temporary variable x <- NULL and then creating a list x['breed'] to fill in with for could help, but I would like to do this with the apply functions or using the tidyverse instead.

Thank you for any help you may give me.


Solution

  • One tidyverse approach would be to use purrr::pmap to loop over the rows of horses_meas like so:

    library(purrr)
    library(dplyr)
    
    set.seed(123)
    
    purrr::pmap(horses_meas,
      function(breed, mean_w, sd_w, mean_h, sd_h, n) {
        data.frame(
          breed = breed,
          weight = rnorm(n, mean_w, sd_w),
          height = rnorm(n, mean_h, sd_h)
        )
      },
      n = 5
    ) |>
      bind_rows()
    #>         breed   weight   height
    #> 1    Hannover 585.9881 1.861506
    #> 2    Hannover 594.2456 1.736092
    #> 3    Hannover 638.9677 1.563494
    #> 4    Hannover 601.7627 1.621315
    #> 5    Hannover 603.2322 1.645434
    #> 6     Arabian 455.0837 1.589346
    #> 7     Arabian 416.1916 1.524893
    #> 8     Arabian 418.0347 1.401669
    #> 9     Arabian 404.9807 1.535068
    #> 10    Arabian 374.9871 1.476360
    #> 11 Holsteiner 441.5220 1.595665
    #> 12 Holsteiner 547.7531 1.721889
    #> 13 Holsteiner 446.7494 1.687669
    #> 14 Holsteiner 483.8886 1.623093
    #> 15 Holsteiner 496.8701 1.742691