Search code examples
rdirichlet

Dominance analysis with Dirichlet regression: error related to formula syntax?


The goal

I want to run dominance analysis on a Dirichlet regression, to approximate the relative importance of a set of predictors (scaled continuous predictors, continuous predictors with splines, and factors). Dirichlet regression is an extension of beta regression to model proportions that are not derived from counts, and that are split between more than 2 categories, see Douma&weedon (2019).

The modelling approach: the syntax is potentially important

I am using the DirichletReg package to fit a Dirichlet regression, with an "alternative" parametrization: this allows to estimate simultaneously the parameters and the precision of the estimation. The syntax is: response ~ parameters | precision. The estimation of parameters can be done with different predictors from those used to estimate precision: response ~ predictor1 + predictor2 | predictor3. If left undeclared, the model assumes fixed precision: response ~ predictors, which can be declared explicilty as: response ~ predictors | 1.

I think that the error is related to the vertical bar in the formula, which separates the predictors used to estimate parameters from the predictors used to estimate precision.

I rely on performance::r2() to calculate a metric of model quality: Nagelkerke's pseudo-R2. However, for the actual analysis, I am thinking of either McFadden or Estrella's pseudo-R2, as they appear suitable to run dominance analysis on multinomial responses, see Luchman 2014.

The obstacle

I get the error message: "fitstat requires at least two elements".

A reproducible example

From data available in the DirichletReg package. The response is only two categories, but in any case, it yields the same error message as in the actual analysis.

library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3

# Assemble data
RS <- ReadingSkills
RS$acc <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
RS$dyslexia <- C(RS$dyslexia, treatment)

# Fit Dirichlet regression
rs2 <- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")

summary(rs2)
#> Call:
#> DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#> "alternative")
#> 
#> Standardized Residuals:
#>                   Min       1Q  Median      3Q     Max
#> 1 - accuracy  -1.5279  -0.7798  -0.343  0.6992  2.4213
#> accuracy      -2.4213  -0.6992   0.343  0.7798  1.5279
#> 
#> MEAN MODELS:
#> ------------------------------------------------------------------
#> Coefficients for variable no. 1: 1 - accuracy
#> - variable omitted (reference category) -
#> ------------------------------------------------------------------
#> Coefficients for variable no. 2: accuracy
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.22386    0.28087   7.918 2.42e-15 ***
#> dyslexiayes -1.81261    0.29696  -6.104 1.04e-09 ***
#> iq          -0.02676    0.06900  -0.388    0.698    
#> ------------------------------------------------------------------
#> 
#> PRECISION MODEL:
#> ------------------------------------------------------------------
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  1.71017    0.32697   5.230 1.69e-07 ***
#> dyslexiayes  2.47521    0.55055   4.496 6.93e-06 ***
#> iq           0.04097    0.27537   0.149    0.882    
#> ------------------------------------------------------------------
#> Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#> AIC: -110.5, BIC: -99.81
#> Number of Observations: 44
#> Links: Logit (Means) and Log (Precision)
#> Parametrization: alternative
as.numeric(performance::r2(rs2))
#> [1] 0.4590758

# Run dominance analysis: error

# If left undeclared, the model assumes fixed precision: parameters |  1
domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | 1,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
             )
#> Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
             )
#> Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")),
             consmodel = "| dyslexia + iq"
             )
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1        DirichletReg_0.7-1 Formula_1.2-4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13  knitr_1.38       magrittr_2.0.3   insight_0.19.1  
#>  [5] lattice_0.20-44  rlang_1.1.0      fastmap_1.1.0    stringr_1.5.0   
#>  [9] highr_0.9        tools_4.1.0      grid_4.1.0       xfun_0.30       
#> [13] cli_3.6.0        withr_2.5.0      htmltools_0.5.2  maxLik_1.5-2    
#> [17] miscTools_0.6-28 yaml_2.3.5       digest_0.6.29    lifecycle_1.0.3 
#> [21] vctrs_0.6.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
#> [25] rmarkdown_2.13   sandwich_3.0-1   reprex_2.0.1     stringi_1.7.6   
#> [29] compiler_4.1.0   generics_0.1.2   zoo_1.8-9

Created on 2023-07-27 by the reprex package (v2.0.1)

References

Luchman Relative Importance Analysis With Multicategory Dependent Variables:: An Extension and Review of Best Practices (2014) Organizational research methods

Douma & Weedon. Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression (2019) Methods in Ecology and Evolution


Solution

  • Interestingly, the solution provided on July 2023 no longer worked in February 2024. Building on the code provided by Joseph Luchman, I could make domir compatible with a Dirichlet model, by fitting a Dirichlet model outside domir, and using update inside the domir call to pass the formula. This was also discussed over at GitHub. Infinite thanks to J. Luchman for the persistance and assistance!

    library(DirichletReg)
    #> Warning: package 'DirichletReg' was built under R version 4.1.3
    #> Loading required package: Formula
    #> Warning: package 'Formula' was built under R version 4.1.1
    library(domir)
    library(performance)
    #> Warning: package 'performance' was built under R version 4.1.3
    
    RS <- ReadingSkills
    response.d <- DR_data(RS$accuracy)
    #> only one variable in [0, 1] supplied - beta-distribution assumed.
    #> check this assumption.
    
    
    # Fit Dirichlet regression
    rs2 <- DirichReg(response.d ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")
    performance::r2( rs2)[[1]]
    #> Nagelkerke's R2 
    #>       0.4590758
    
    
    domir::domir(response.d ~ dyslexia + iq,
                 function(y)  {
                   iv <- attr(terms(y), "term.labels")
                   fml <- paste0("response.d ~ ", paste0(iv, collapse = "+"), "| dyslexia + iq", collapse = "")
                   print(fml)
                   performance::r2( update(rs2, formula(fml) ) )[[1]]})  
    #> [1] "response.d ~ dyslexia+iq| dyslexia + iq"
    #> [1] "response.d ~ dyslexia| dyslexia + iq"
    #> [1] "response.d ~ iq| dyslexia + iq"
    #> Overall Value:      0.4590758 
    #> 
    #> General Dominance Values:
    #>          General Dominance Standardized Ranks
    #> dyslexia       0.455042413   0.99121418     1
    #> iq             0.004033357   0.00878582     2
    #> 
    #> Conditional Dominance Values:
    #>          Subset Size: 1 Subset Size: 2
    #> dyslexia    0.457263643    0.452821183
    #> iq          0.006254587    0.001812127
    #> 
    #> Complete Dominance Designations:
    #>                  Dmnated?dyslexia Dmnated?iq
    #> Dmnates?dyslexia               NA       TRUE
    #> Dmnates?iq                  FALSE         NA
    
    sessionInfo()
    #> R version 4.1.0 (2021-05-18)
    #> Platform: x86_64-w64-mingw32/x64 (64-bit)
    #> Running under: Windows 10 x64 (build 19045)
    #> 
    #> Matrix products: default
    #> 
    #> locale:
    #> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
    #> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
    #> [5] LC_TIME=Spanish_Spain.1252    
    #> 
    #> attached base packages:
    #> [1] stats     graphics  grDevices utils     datasets  methods   base     
    #> 
    #> other attached packages:
    #> [1] performance_0.10.0 domir_1.0.1        DirichletReg_0.7-1 Formula_1.2-4     
    #> 
    #> loaded via a namespace (and not attached):
    #>  [1] rstudioapi_0.13  knitr_1.38       magrittr_2.0.3   insight_0.19.1  
    #>  [5] lattice_0.20-44  rlang_1.1.0      fastmap_1.1.0    stringr_1.5.0   
    #>  [9] highr_0.9        tools_4.1.0      grid_4.1.0       xfun_0.39       
    #> [13] cli_3.6.0        withr_2.5.2      htmltools_0.5.5  maxLik_1.5-2    
    #> [17] miscTools_0.6-28 yaml_2.3.5       digest_0.6.29    lifecycle_1.0.4 
    #> [21] vctrs_0.6.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
    #> [25] rmarkdown_2.13   sandwich_3.0-1   reprex_2.0.1     stringi_1.7.6   
    #> [29] compiler_4.1.0   generics_0.1.2   zoo_1.8-9
    

    Created on 2024-02-26 by the reprex package (v2.0.1)