Search code examples
rpermutationveganpermute

Can function how() in package permute support unbalanced data in the context of defining permutation restrictions for vegan::adonis2()?


I'm conducting a study comparing insect communities in forest canopy gaps and closed-canopy forest in three distinct forested areas. My experimental design is hierarchical and unbalanced.

Simplified description of study design (this simplified design is reflected in dummy tibble below):

  • Three forested areas;
  • Within each forested area are two canopy gaps;
  • Associated with each canopy gap are three quadrats - one in the gap itself and two in adjacent closed-canopy forest (this is where data are unbalanced).

I am interested in comparing the community in gaps with the community in adjacent forest using PERMANOVA as implemented in vegan::adonis2(). Because the survey design is hierarchical, I must restrict permutations. However, package permute is reported to require that "the dataset be balanced," with the implication that "unbalanced data would have to be manually permuted" (link). My question is:

  • Is it true that permute offers no support for unbalance data?
  • If this is not true, how can I use package permute in a way which respects unbalanced data in the context of defining permutation restrictions within vegan::adonis2()?

Below is a dummy tibble in the format of my main data.

# read in packages
library(tibble)
library(vegan)
library(permute)

# Define tibble, where rows correspond to individual survey quadrats
# Tibble contains:
### three forested areas (column 'forestarea'); 
### six gaps (column 'gapNo'); 
### status of quadrat as gap or forest (column 'gapOrforest');
### count data for six insect species. 
data <- tibble(
  # three forest areas, each with 2 canopy gaps 
  forestarea = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3),
  gapNo = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6),
  # quadrat status as in gap or in forest
  gapOrforest = rep(c("g", "f", "f"), 6),
  # species data, integer counts
  species1 = c(50, 0, 0, 65, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 42, 0, 0),
  species2 = c(0, 10, 8, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 5, 9, 9),
  species3 = c(30, 0, 10, 0, 0, 23, 0, 0, 0, 45, 0, 0, 23, 10, 0, 43, 21, 60),
  species4 = c(0, 5, 12, 0, 0, 3, 20, 0, 0, 0, 15, 0, 0, 0, 18, 0, 0, 0),
  species5 = c(25, 0, 0, 0, 13, 0, 0, 0, 22, 0, 0, 0, 0, 0, 0, 14, 0, 0),
  species6 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 4, 5, 0, 0, 0, 6, 0),
  species7 = c(14, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 4),
  species8 = c(0, 7, 0, 0, 0, 5, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0),
  species9 = c(0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0, 0, 0),
  species10 = c(8, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0)
)

If data were balanced, PERMANOVA with restricted permutations would be carried out as below:

    dist <- vegdist(data[,4:ncol(data)])
    
    control <- how(within = Within(type = "free"),
                 plots = Plots(strata = data$gapNo, type = "none"),
                 nperm = 999,
                 observed = TRUE)
    
    adonis2(dist ~ gapNo, 
            data=data,
            permutations = control)

Can the use of function how() (or package permute more broadly) as shown above be modified to support unbalanced data?


Solution

  • Is it true that permute offers no support for unbalance data?

    No, it is not true. permute doesn't require the data to be balanced in general. However if you are trying to test a variable that varies at the whole plot (plots) level, then you do need to have a balanced design. I'm not sure how you would permute data from an unbalanced design in such a way that you keep the same ordering of samples (or residuals) within the whole plots if you have some whole plots with 1 sample and others with 2. It would be like trying to shove a square peg into a round hole.

    There might be a solution this issue - but if there is I don't know what it is and the developers of Canoco (from where the permutation designs used in permute originate) don't either. I'd be happy to learn about it however as then I could look to implement it in permute.

    If you want to test a whole plot variable in an unbalanced data set, the immediate solution is to drop one of the quadrats in the adjacent closed-canopy per forested area so that you have balance and then do the analysis.

    However, you don't seem to be testing a variable that varies at the whole plot level. So this whole balance things is moot, is it not? So long as you restrict samples to be permuted within the whole plots, it doesn't matter from a practical point of view if that data are unbalanced, as we are just randomly permuting samples within the levels of gapNo.

    For this kind of thing, I tend to prefer using the blocks argument to restrict permutations (as it's just easier to type). Here you could do

    h <- with(data,
      how(blocks = gapNo, nperm = 999)
    )
    

    and note that you don't want the observed permutation in the permutation as the pseudo $F$ is already computed for the observed data.

    As there are only 46656 permutations of these data, you might consider do an exact test and evaluate all possible permutations, rather than just doing 1000 random ones.

    Finally, your model is incorrect. gapNo codes for the sampling locations where you have 1 gap sample and 2 adjacent samples. If you fit a model with gapNo as covariate you won't be testing the effect of gapOrForest, but simply testing the differences between the six gaps. You need y ~ gapOrForest.

    Personally, I would fit this with dbrda() if you must use a dissimilarity:

    spp <- data[, 4:ncol(data)]
    dbrda(spp ~ gapOrForest + Condition(gapNo), 
          data = data,
          permutations = control)
    

    as now you can partial out the between gap variation, which is what you want, ideally.