Search code examples
rglmmtmbbeta-distribution

How to compress proportion data to remove 0 and 1 values?


I am working with data of vegetation cover (proportions) for different height strata (0-5, 5-15, 15-30, >30 cm, and also bare ground) amongst four different sites (sitio) and two different time periods (epoca: breeding and non breeding season). I went with GLM using the beta distribution (glmmTMB) and then used emmeans. In this question I showed the model I am using and had my interpretation problems solved.

Now I want to know how can I compress or normalize my data columns to exclude 0 and 1 values, since I can't run the model for some variables that include 0 values (e.g. 0-5 cm vegetation cover):

beta_sd <- glmmTMB(X0.5 ~ sitio * epoca,
+                    data = vege2,
+                    family = beta_family)
Error in eval(family$initialize) : y values must be 0 < y < 1

Solution

  • You could replace the respective probabilities with something very small or large respectively.

    vege2$X0.5 <- with(vege2, replace(X0.5, X0.5 == 0, .0001))
    vege2$X0.5 <- with(vege2, replace(X0.5, X0.5 == 1, .9999))
    
    glmmTMB::glmmTMB(X0.5 ~ sitio * epoca, data=vege2, family=glmmTMB::beta_family) |>
      summary() |> coef() |> base::`[[`('cond')
    #              Estimate Std. Error    z value  Pr(>|z|)
    # (Intercept) -1.188736  2.1352218 -0.5567274 0.5777137
    # sitio        0.486339  0.6298129  0.7721961 0.4399983
    # epoca        0.862026  1.0322244  0.8351149 0.4036530
    # sitio:epoca -0.276354  0.3023759 -0.9139419 0.3607474
    

    Data:

    vege2 <- expand.grid(sitio=1:5, epoca=1:3)
    set.seed(42)
    vege2$X0.5 <- runif(nrow(vege2))
    vege2$X0.5[c(2, 4, 6)] <- c(0, 1, 1)