Search code examples
ranovanlmerandom-effects

Error in anova(): "Error in getResponseFormula(el) : 'form' must be a two-sided formula"


I have a longitudinal dataset for plant growth recorded in different seasons. I fitted the data to growth models with/without seasonal effect using nlme function from nlme package. To see the seasonal effect on the estimated parameter (k), I compared the two models using anova function, however, I got "Error in getResponseFormula(el) : 'form' must be a two-sided formula" and stuck in this problem. Can anyone help me solve this problem?

Here is the reproducible example.

data

library(tidyverse)
# generate dummy data
n = 100
id = as.factor(1:n)
season = c("s1","s2","s3","s4")
L0 = rnorm(n,40,5)
td = rnorm(n, 180, 10)

growth <- function(td, Lt_1){
  return(function(k,p){
    150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
  })
}
L1 = growth(Lt_1 = L0, td = td)(p = 1.2, k = 0.55)
L2 = growth(Lt_1 = L1, td = td)(p = 1.2, k = 1.09)
L3 = growth(Lt_1 = L2, td = td)(p = 1.2, k = 0.62)
L4 = growth(Lt_1 = L3, td = td)(p = 1.2, k = 0.97)

df <- data.frame(id = id, L0 = L0, L1 = L1, L2 = L2, L3 = L3, L4 = L4) %>% 
  pivot_longer(cols = L0:L4, names_to = "Ltype", values_to = "Lobs") %>% 
  mutate(Lobs = round(Lobs)) %>% 
  group_by(id) %>% 
  mutate(Lt_1 = lag(Lobs)) %>% 
  filter(!is.na(Lt_1)) %>% 
  mutate(td = round(rnorm(4, 180, 10))) %>% 
  mutate(season = factor(season)) %>% 
  ungroup() %>% 
  mutate(season2 = case_when(
    season == "s3" ~ "s1",
    season == "s4" ~ "s2",
    TRUE ~ season
  )) %>% 
  mutate(season2 = factor(season2)) %>% 
  mutate(log_Lobs = log(Lobs))

> head(df)
# A tibble: 6 x 8
  id    Ltype  Lobs  Lt_1    td season season2 log_Lobs
  <fct> <chr> <dbl> <dbl> <dbl> <fct>  <fct>      <dbl>
1 1     L1       57    48   178 s1     s1          4.04
2 1     L2       76    57   184 s2     s2          4.33
3 1     L3       87    76   188 s3     s1          4.47
4 1     L4      103    87   194 s4     s2          4.63
5 2     L1       43    35   175 s1     s1          3.76
6 2     L2       63    43   176 s2     s2          4.14

fitting

library(nlme)
## model without seasonal effect
formula = Lobs ~ 150*((1 + ((150/Lt_1)^(1/p)-1)*exp(-k*td/365))^(-p))
p = 1.2
k = 0.4

fit_null <- nlme(formula,
              fixed = c(p ~ 1, k ~ 1),
              random = p ~ 1 | id,
              data = df,
              start = list(fixed = c(p, k)),
              na.action = na.exclude,
              control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))

## model with seasonal effect
p = 1.2
k = c(0.4, 1.09)

fit_withSeason <- nlme(formula,
                fixed = c(p ~ 1, k ~ 1 + season2),
                random = k ~ 1 | id,
                data = df,
                start = list(fixed = c(p, k)),
                na.action = na.exclude,
                control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE))

summary of the two models

> summary(fit_null)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: formula 
  Data: df 
       AIC      BIC    logLik
  2312.975 2328.941 -1152.488

Random effects:
 Formula: p ~ 1 | id
                   p Residual
StdDev: 8.300513e-05 4.315789

Fixed effects:  c(p ~ 1, k ~ 1) 
      Value  Std.Error  DF   t-value p-value
p 0.7335954 0.09075133 299  8.083577       0
k 0.9823510 0.05594228 299 17.560080       0
 Correlation: 
  p     
k -0.969

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.4898414 -0.8966244 -0.2154542  0.8401292  2.2148509 

Number of Observations: 400
Number of Groups: 100 

> summary(fit_withSeason)
Nonlinear mixed-effects model fit by maximum likelihood
  Model: formula 
  Data: df 
       AIC      BIC    logLik
  1466.922 1486.879 -728.4608

Random effects:
 Formula: k ~ 1 | id
        k.(Intercept) Residual
StdDev:    0.03294767  1.37547

Fixed effects:  c(p ~ 1, k ~ 1 + season2) 
                  Value  Std.Error  DF  t-value p-value
p             1.9509211 0.16984927 298 11.48619       0
k.(Intercept) 0.5212180 0.01156849 298 45.05498       0
k.season2s2   0.4151107 0.00828721 298 50.09055       0
 Correlation: 
              p      k.(In)
k.(Intercept) -0.868       
k.season2s2   -0.572  0.265

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-2.341255091 -0.727512901  0.003841993  0.622830078  3.190926108 

Number of Observations: 400
Number of Groups: 100 

Error

# model comparison to test the seasonal effect
> anova(fit_null, fit_withSeason)
Error in getResponseFormula(el) : 'form' must be a two-sided formula

Solution

  • This is arguably a bug in nlme, but a tricky one. When you pass the formula as a symbol called "formula" (i.e. "formula", rather than "Lobs ~ "), it fails to extract the formula from the model call properly.

    If you change the variable name of your formula from formula to (e.g.) form1, I think everything should work fine (another reason not to use names of built-in objects as variable names!)

    More generally (for other issues involving improper evaluation of objects stored as symbols) you can work around this by using do.call(), which tricks R into evaluating the formula before storing the call:

    fit_null <- do.call(nlme,
              list(formula,
                   fixed = c(p ~ 1, k ~ 1),
                   random = p ~ 1 | id,
                   data = df,
                   start = list(fixed = c(p, k)),
                   na.action = na.exclude,
                   control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
             ))
    
    ## model with seasonal effect
    p = 1.2
    k = c(0.4, 1.09)
    
    fit_withSeason <- do.call(nlme,
                    list(formula,
                    fixed = c(p ~ 1, k ~ 1 + season2),
                    random = k ~ 1 | id,
                    data = df,
                    start = list(fixed = c(p, k)),
                    na.action = na.exclude,
                    control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)))
    
    anova(fit_null, fit_withSeason)
    
    
                   Model df      AIC      BIC     logLik   Test  L.Ratio p-value
    fit_null           1  4 2317.913 2333.879 -1154.9565                        
    fit_withSeason     2  5 1540.088 1560.045  -765.0438 1 vs 2 779.8253  <.0001
    

    I've submitted this as an R bug