Search code examples
rtidyversemcmc

lapply instead of for loop for randomised hypothesis testing r


I have a df that looks something like this like this:

set.seed(42)
ID <- sample(1:30, 100, rep=T) 
Trait <- sample(0:1, 100, rep=T) 
Year <- sample(1992:1999, 100, rep=T)
df <- cbind(ID, Trait, Year)
df <- as.data.frame(df)

Where ID is an individual organism, trait is a presence/absence of a phenotype and Year is the year an observation was made.

I would like to model if trait is random between individuals, something like this

library(MCMCglmm) 
m <- MCMCglmm(Trait ~ ID, random = ~ Year, data = df, family = "categorical")

Now, would like to shuffle the Trait column and run x permutations, to check if my observed mean and CI fall outside of what's expected from random. I could do this with a for loop, but I'd rather use a tidyverse solution. I've read that lapply is a bette(?) alternative, but I am struggling to find a specific enough walk-through that I can follow.

I'd appreciate any advice offered here.

Cheers!

Jamie


Solution

  • EDIT October 10th. Cleaned up the code and per comment below added the code to give you back a nice organized tibble\dataframe

    ### decide how many shuffles you want and name them
    ### in an orderly fashion for the output
    
    shuffles <- 1:10
    names(shuffles) <- paste0("shuffle_", shuffles)
    
    library(MCMCglmm)
    library(dplyr)
    library(tibble)
    library(purrr)
    
    ddd <- purrr::map(shuffles,
                      ~ df %>%
                         mutate(Trait = sample(Trait)) %>%
                         MCMCglmm(fixed = Trait ~ ID,
                                  random = ~ Year,
                                  data = .,
                                  family = "categorical",
                                  verbose = FALSE)) %>%
       purrr::map( ~ tibble::as_tibble(summary(.x)$solutions, rownames = "model_term")) %>%
       dplyr::bind_rows(., .id = 'shuffle')
    ddd
    #> # A tibble: 20 x 7
    #>    shuffle    model_term  post.mean `l-95% CI` `u-95% CI` eff.samp pMCMC
    #>    <chr>      <chr>           <dbl>      <dbl>      <dbl>    <dbl> <dbl>
    #>  1 shuffle_1  (Intercept)  112.         6.39     233.       103.   0.016
    #>  2 shuffle_1  ID            -6.31     -13.5       -0.297    112.   0.014
    #>  3 shuffle_2  (Intercept)   24.9      -72.5      133.       778.   0.526
    #>  4 shuffle_2  ID            -0.327     -6.33       5.33     849.   0.858
    #>  5 shuffle_3  (Intercept)    4.39     -77.3       87.4      161.   0.876
    #>  6 shuffle_3  ID             1.04      -3.84       5.99     121.   0.662
    #>  7 shuffle_4  (Intercept)    7.71     -79.0      107.       418.   0.902
    #>  8 shuffle_4  ID             0.899     -4.40       6.57     408.   0.694
    #>  9 shuffle_5  (Intercept)   30.4      -62.4      144.       732.   0.51 
    #> 10 shuffle_5  ID            -0.644     -6.61       4.94     970.   0.866
    #> 11 shuffle_6  (Intercept)  -45.5     -148.        42.7      208.   0.302
    #> 12 shuffle_6  ID             4.73      -0.211     11.6       89.1  0.058
    #> 13 shuffle_7  (Intercept)  -16.2     -133.        85.9      108.   0.696
    #> 14 shuffle_7  ID             2.47      -2.42      10.3       47.8  0.304
    #> 15 shuffle_8  (Intercept)    0.568      0.549      0.581      6.60 0.001
    #> 16 shuffle_8  ID            -0.0185    -0.0197    -0.0168     2.96 0.001
    #> 17 shuffle_9  (Intercept)   -6.95    -112.        92.2      452.   0.886
    #> 18 shuffle_9  ID             2.07      -3.30       8.95     370.   0.476
    #> 19 shuffle_10 (Intercept)   43.8      -57.0      159.       775.   0.396
    #> 20 shuffle_10 ID            -1.36      -7.44       5.08     901.   0.62
    

    Your original data

    set.seed(42)
    ID <- sample(1:30, 100, rep=T) 
    Trait <- sample(0:1, 100, rep=T) 
    Year <- sample(1992:1999, 100, rep=T)
    df <- cbind(ID, Trait, Year)
    df <- as.data.frame(df)