Search code examples
rlme4collaborative-filteringconvergence

problems with model convergence: Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals,


I'm using lme4 to build a collaborative filter and running into convergence issues. Trying to solve via the following resources and getting a new error:

Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals, :
number of items to replace is not a multiple of replacement length

This after the model was running towards convergence for ~ 48 hours.

  1. https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html
  2. https://stats.stackexchange.com/questions/242109/model-failed-to-converge-warning-in-lmer

    note: optimx nlmimb seems best, then L-BFGS-B

I have a model that's structured as follows:

library(lme4); library(optimx)
library(stringi)
library(data.table)

set.seed(1423L)
# highly imbalanced outcome variable
y <- sample.int(2L, size= 910000, replace=T, prob= c(0.98, 0.02)) - 1L
# product biases
prod <- sample(letters, size= 910000, replace=T)
#  user biases
my_grps <- stringi::stri_rand_strings(n= 35000, length= 10)
grps <- rep(my_grps, each= 26)
x1 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x2 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x3 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x4 <- sample(LETTERS[1:5], size= 91000, replace=T)

dt <- data.table(y= y,
             prod= prod, grps= grps,
             x1= x1, x2= x2, x3= x3, x4= x4)

lmer1 <- glmer(y ~ -1 + prod + x1 + x2 + x3 + x4 + (1|grps),
               data= dt, family= binomial(link= "logit"),
               control = glmerControl(optimizer ='optimx', optCtrl=list(method='nlminb')))

I haven't guaranteed that the above data reproduces the error; but that's the model setup. I don't understand the error message at all. Any help would be appreciated

Reported as lmer issue #425

NOTE: in my true use case, I have closer to 15.5M observations and 30-50 products where each product has a different average response rate (y)

I have also switched from a kNN approach (typical collaborative filter) to HLM because R is terribly optimized for kNN at scale--should use something like annoy, which I have yet to try out.


Solution

  • ```

    julia> @time m1 = fit!(glmm(@formula(y ~ 0 + prod + x1 + x2 + x3 + x4 + (1|grps)), dt, Bernoulli()), fast = true, verbose=true)
    f_1: 180528.00386 [1.0]
    f_2: 187488.87167 [1.75]
    f_3: 177702.14693 [0.25]
    f_4: 177671.46777 [0.112452]
    f_5: 177676.1792 [0.152245]
    f_6: 177667.49847 [0.0374517]
    f_7: 177667.32134 [0.0285566]
    f_8: 177667.11503 [0.0108968]
    f_9: 177667.08367 [0.00339678]
    f_10: 177667.08031 [0.000203859]
    f_11: 177667.08056 [0.000953859]
    f_12: 177667.0803 [1.35223e-7]
    f_13: 177667.0803 [7.51352e-5]
    f_14: 177667.0803 [7.63522e-6]
    f_15: 177667.0803 [8.85223e-7]
    f_16: 177667.0803 [0.0]
    f_17: 177667.0803 [6.76114e-8]
    f_18: 177667.0803 [1.72723e-7]
    f_19: 177667.0803 [1.20167e-7]
    f_20: 177667.0803 [1.4227e-7]
    f_21: 177667.0803 [1.38746e-7]
    f_22: 177667.0803 [1.36985e-7]
    f_23: 177667.0803 [1.36089e-7]
    f_24: 177667.0803 [1.34357e-7]
    f_25: 177667.0803 [1.35656e-7]
    f_26: 177667.0803 [1.35439e-7]
    f_27: 177667.0803 [1.35323e-7]
    f_28: 177667.0803 [1.35273e-7]
    f_29: 177667.0803 [1.35248e-7]
    f_30: 177667.0803 [0.0]
     75.913228 seconds (174.16 k allocations: 19.486 GiB, 8.10% gc time)
    Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
      Formula: y ~ 0 + prod + x1 + x2 + x3 + x4 + (1 | grps)
      Distribution: Distributions.Bernoulli{Float64}
      Link: GLM.LogitLink()
    
      Deviance (Laplace approximation): 177667.0803
    
    Variance components:
              Column   Variance Std.Dev.
     grps (Intercept)         0        0
    
     Number of obs: 910000; levels of grouping factors: 35000
    
    Fixed-effects parameters:
               Estimate Std.Error   z value P(>|z|)
    prod: a    -3.82317 0.0402319  -95.0283  <1e-99
    prod: b    -3.87486 0.0411777  -94.1009  <1e-99
    prod: c     -3.8979 0.0414131  -94.1226  <1e-99
    prod: d    -3.90172 0.0416467  -93.6862  <1e-99
    prod: e    -3.94375 0.0423702  -93.0786  <1e-99
    prod: f    -3.87321 0.0411412  -94.1443  <1e-99
    prod: g    -3.84563  0.040832  -94.1818  <1e-99
    prod: h    -3.85295 0.0407992  -94.4371  <1e-99
    prod: i    -3.86082 0.0408777   -94.448  <1e-99
    prod: j    -3.92742 0.0422041  -93.0577  <1e-99
    prod: k    -3.90827 0.0417974   -93.505  <1e-99
    prod: l    -3.90168 0.0415682   -93.862  <1e-99
    prod: m    -3.93383 0.0421348  -93.3629  <1e-99
    prod: n    -3.82755 0.0403628  -94.8286  <1e-99
    prod: o    -3.89546 0.0416489  -93.5311  <1e-99
    prod: p    -3.91643 0.0418437  -93.5966  <1e-99
    prod: q    -3.88423 0.0414074  -93.8054  <1e-99
    prod: r     -3.9031 0.0416133  -93.7944  <1e-99
    prod: s    -3.85363 0.0407327  -94.6079  <1e-99
    prod: t    -3.92431 0.0419838   -93.472  <1e-99
    prod: u    -3.91551 0.0417962   -93.681  <1e-99
    prod: v    -3.92217 0.0417068  -94.0415  <1e-99
    prod: w    -3.90503  0.041674  -93.7043  <1e-99
    prod: x    -3.81516 0.0402678  -94.7447  <1e-99
    prod: y    -3.86918 0.0410894  -94.1648  <1e-99
    prod: z    -3.83903 0.0404826  -94.8316  <1e-99
    x1        0.0302483 0.0247737   1.22098  0.2221
    x2       -0.0311121 0.0253477  -1.22741  0.2197
    x3        0.0183217 0.0248309  0.737858  0.4606
    x4: B     0.0104487 0.0235136  0.444368  0.6568
    x4: C    -0.0170338 0.0236728 -0.719553  0.4718
    x4: D    -0.0356445 0.0238845  -1.49237  0.1356
    x4: E    -0.0303572  0.023757  -1.27782  0.2013
    

    ```

    Notice that it took just over a minute (on a 4-core i5 processor desktop with 8 GB of memory).

    Without fast=true it will take much longer but arrive at essentially the same answers.

    It is not surprising that the estimated standard deviation of the random effects is 0. It represents the excess variability associated with the groups beyond that which is induced by the inherent variability of the random response. The fact that the coefficients for the levels of prod are significant is because there is no intercept in the model.