Search code examples
rapplynested-listsstatistics-bootstrap

Working across sub-lists with apply() functions


I am trying to the bootstrap the proportional occurrence of diet items for 7 individuals and calculate a sd()

Lets say there are 9 prey items on the menu.

Diet <- c("Beaver","Bird", "Bobcat","Coyote", "Deer", "Elk",
    "Porcupine", "Raccoon",   "SmMamm")

And that these prey items are eaten by 7 different individuals of the same species

Inds <- c("P01", "P02", "P03", "P04", "P05", "P06", "P07")

My goal is the bootstrap the proportional occurrence of each diet item for each individual.
The loop below generates five diets for each individual (each diet containing N = 20 feedings) that were sampled with replacement. The data are stored as a list of the individuals, each of which contains a list of the sample diets.

BootIndDiet <- list()
IndTotboot <- list()
for(i in Inds){
    for(j in 1:5){
        BootIndDiet[[j]] <- prop.table(table(sample(Diet, 20 ,replace = T)))
                        }
            IndTotboot[[i]] <- BootIndDiet
            }

Below I have included the first two diets of individual P07 as an example of the loop results

$P07
$P07[[1]]

   Beaver      Bird    Bobcat      Deer       Elk 
     0.05      0.15      0.20      0.10      0.15 
Porcupine   Raccoon    SmMamm 
     0.15      0.15      0.05 

$P07[[2]]

   Beaver      Bird    Bobcat    Coyote      Deer 
     0.15      0.10      0.20      0.05      0.05 
      Elk Porcupine   Raccoon    SmMamm 
     0.05      0.20      0.10      0.10 

I then want to calculate the sd() of the proportional of each species for each individual. Equivocally, for each individual (P01 - P07) I want the sd() of the proportional occurrence of each prey species across the 5 diets.

While my loop above runs, I suspect there is a better way (possibly using the boot() function) that avoids lists...

While I have only included 5 samples (bootstraps) for each individual here, I hope to generate 10000.

Suggestions on a different strategy or how to apply sd() across sub-lists is greatly appreciated.


Solution

  • I'd try to obtain an array (instead of a nested list) in this way:

       IndTotboot <-array(replicate(5*length(Inds),prop.table(table(sample(as.factor(Diet), 20 ,replace = T))),simplify=T), dim=c(length(Diet),5,length(Inds)), dimnames=list(Diet,NULL,Inds))
    

    With replicate you can execute an expression a given number of times and store the result as an array (if possible). I added an as.factor before Diet to make sure that the table takes trace of every Diet (even the ones with a 0 frequency).

    The IndTotboot object obtained is a 3-dimensional array where the first index indicates the Diet, the second the bootstrap replications and the third the Inds. From there you can use apply in the standard way.

    Edit:

    If you try str(IndTotboot) you get:

        > str(IndTotboot)
         num [1:9, 1:5, 1:7] 0.1 0.15 0.15 0.1 0.1 0.1 0.15 0.05 0.1 0.15 ...
         - attr(*, "dimnames")=List of 3
           ..$ : chr [1:9] "Beaver" "Bird" "Bobcat" "Coyote" ...
           ..$ : NULL
           ..$ : chr [1:7] "P01" "P02" "P03" "P04" ...
    

    The first line is the most important. It says num [1:9, 1:5, 1:7], which means a 9x5x7 array. The rest indicates the dimnames, the names of the dimensions, which is a list. They are the generalization of the rownames and the colnames for a matrix.

    Now, to obtain the sd for every Diet and Inds you just use apply:

        apply(IndTotboot,MARGIN=c(1,3),sd)