Search code examples
rdirichlet

How to pass a formula object to DirichReg (setting up for function)


I am trying to pass a formula object to a Dirichlet Regression, using the DirichReg package in R. As shown below, the package does not seem able to accept formulas in this format, but nothing in the documentation notes this limitation. The reason for this workflow is that I am trying to set up a cross-validation function that can apply over a list of different formulas (IE with different covariates) and return the out-of-sample predictive ability to help with model selection.

library (DirichletReg)

df <- ArcticLake  # plug-in your data here
df$Y <- DR_data(df[,1:3])  # prepare the Y's
Warning in DR_data(df[, 1:3]) :
  not all rows sum up to 1 => normalization forced

formula <- reformulate(termlabels = "depth", response="Y")

mod <- DirichReg(formula, df)

Error: object of type 'symbol' is not subsettable
Error during wrapup: 

mod <- DirichReg(Y~depth, df)

str(Y~depth)

Class 'formula'  language Y ~ depth
  ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 

str(formula)

Class 'formula'  language Y ~ depth
  ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 

formula <- as.formula("Y~depth")
mod <- DirichReg(formula, df)

Error: object of type 'symbol' is not subsettable
Error during wrapup: 

There doesn't seem to be any difference between my 'formula' object and the formula as specified in the working DirichReg call.

My guess is that it has something to do with way that the response variable is formatted using the DR_data command, but I can't figure out a way to get around this to specify formulas on the fly in a function.

> str(df$Y)
 DirichletRegData [1:39, 1:3] 0.775 0.719 0.507 0.524 0.7 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:39] "1" "2" "3" "4" ...
  ..$ : chr [1:3] "sand" "silt" "clay"
 - attr(*, "Y.original")='data.frame':  39 obs. of  3 variables:
  ..$ sand: num [1:39] 0.775 0.719 0.507 0.522 0.7 0.665 0.431 0.534 0.155 0.317 ...
  ..$ silt: num [1:39] 0.195 0.249 0.361 0.409 0.265 0.322 0.553 0.368 0.544 0.415 ...
  ..$ clay: num [1:39] 0.03 0.032 0.132 0.066 0.035 0.013 0.016 0.098 0.301 0.268 ...
 - attr(*, "dims")= int 3
 - attr(*, "dim.names")= chr [1:3] "sand" "silt" "clay"
 - attr(*, "obs")= int 39
 - attr(*, "valid_obs")= int 39
 - attr(*, "normalized")= logi TRUE
 - attr(*, "transformed")= logi FALSE
 - attr(*, "base")= num 1

Solution

  • @Smiley Bcc may have been hinting at this, but it appears that you have to call as.formula() from within the DirichletReg() function. From your example data above:

    > f <- as.formula('Y~depth')
    > mod <- DirichReg(f, df)
    Error: object of type 'symbol' is not subsettable
    
    > f <- 'Y~depth'
    > mod <- DirichReg(as.formula(f), df)
    

    Interestingly, it doesn't work (probably for different reasons) when you literally name the object "formula":

    > formula <- 'Y~depth'
    > mod <- DirichReg(as.formula(formula), df)
    Error: object of type 'closure' is not subsettable
    

    I assume there's some kind of direct reference to an object called formula inside the DirichletReg() function, so avoid calling it specifically that.