r lme4 collaborative-filtering convergence

problems with model convergence: Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals,

I'm using lme4 to build a collaborative filter and running into convergence issues. Trying to solve via the following resources and getting a new error:

Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals, :
number of items to replace is not a multiple of replacement length

This after the model was running towards convergence for ~ 48 hours.

https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html
https://stats.stackexchange.com/questions/242109/model-failed-to-converge-warning-in-lmer

note: optimx nlmimb seems best, then L-BFGS-B

I have a model that's structured as follows:

library(lme4); library(optimx)
library(stringi)
library(data.table)

set.seed(1423L)
# highly imbalanced outcome variable
y <- sample.int(2L, size= 910000, replace=T, prob= c(0.98, 0.02)) - 1L
# product biases
prod <- sample(letters, size= 910000, replace=T)
#  user biases
my_grps <- stringi::stri_rand_strings(n= 35000, length= 10)
grps <- rep(my_grps, each= 26)
x1 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x2 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x3 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x4 <- sample(LETTERS[1:5], size= 91000, replace=T)

dt <- data.table(y= y,
             prod= prod, grps= grps,
             x1= x1, x2= x2, x3= x3, x4= x4)

lmer1 <- glmer(y ~ -1 + prod + x1 + x2 + x3 + x4 + (1|grps),
               data= dt, family= binomial(link= "logit"),
               control = glmerControl(optimizer ='optimx', optCtrl=list(method='nlminb')))

I haven't guaranteed that the above data reproduces the error; but that's the model setup. I don't understand the error message at all. Any help would be appreciated

Reported as lmer issue #425

NOTE: in my true use case, I have closer to 15.5M observations and 30-50 products where each product has a different average response rate (y)

I have also switched from a kNN approach (typical collaborative filter) to HLM because R is terribly optimized for kNN at scale--should use something like annoy, which I have yet to try out.

Solution

```

julia> @time m1 = fit!(glmm(@formula(y ~ 0 + prod + x1 + x2 + x3 + x4 + (1|grps)), dt, Bernoulli()), fast = true, verbose=true)
f_1: 180528.00386 [1.0]
f_2: 187488.87167 [1.75]
f_3: 177702.14693 [0.25]
f_4: 177671.46777 [0.112452]
f_5: 177676.1792 [0.152245]
f_6: 177667.49847 [0.0374517]
f_7: 177667.32134 [0.0285566]
f_8: 177667.11503 [0.0108968]
f_9: 177667.08367 [0.00339678]
f_10: 177667.08031 [0.000203859]
f_11: 177667.08056 [0.000953859]
f_12: 177667.0803 [1.35223e-7]
f_13: 177667.0803 [7.51352e-5]
f_14: 177667.0803 [7.63522e-6]
f_15: 177667.0803 [8.85223e-7]
f_16: 177667.0803 [0.0]
f_17: 177667.0803 [6.76114e-8]
f_18: 177667.0803 [1.72723e-7]
f_19: 177667.0803 [1.20167e-7]
f_20: 177667.0803 [1.4227e-7]
f_21: 177667.0803 [1.38746e-7]
f_22: 177667.0803 [1.36985e-7]
f_23: 177667.0803 [1.36089e-7]
f_24: 177667.0803 [1.34357e-7]
f_25: 177667.0803 [1.35656e-7]
f_26: 177667.0803 [1.35439e-7]
f_27: 177667.0803 [1.35323e-7]
f_28: 177667.0803 [1.35273e-7]
f_29: 177667.0803 [1.35248e-7]
f_30: 177667.0803 [0.0]
 75.913228 seconds (174.16 k allocations: 19.486 GiB, 8.10% gc time)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: y ~ 0 + prod + x1 + x2 + x3 + x4 + (1 | grps)
  Distribution: Distributions.Bernoulli{Float64}
  Link: GLM.LogitLink()

  Deviance (Laplace approximation): 177667.0803

Variance components:
          Column   Variance Std.Dev.
 grps (Intercept)         0        0

 Number of obs: 910000; levels of grouping factors: 35000

Fixed-effects parameters:
           Estimate Std.Error   z value P(>|z|)
prod: a    -3.82317 0.0402319  -95.0283  <1e-99
prod: b    -3.87486 0.0411777  -94.1009  <1e-99
prod: c     -3.8979 0.0414131  -94.1226  <1e-99
prod: d    -3.90172 0.0416467  -93.6862  <1e-99
prod: e    -3.94375 0.0423702  -93.0786  <1e-99
prod: f    -3.87321 0.0411412  -94.1443  <1e-99
prod: g    -3.84563  0.040832  -94.1818  <1e-99
prod: h    -3.85295 0.0407992  -94.4371  <1e-99
prod: i    -3.86082 0.0408777   -94.448  <1e-99
prod: j    -3.92742 0.0422041  -93.0577  <1e-99
prod: k    -3.90827 0.0417974   -93.505  <1e-99
prod: l    -3.90168 0.0415682   -93.862  <1e-99
prod: m    -3.93383 0.0421348  -93.3629  <1e-99
prod: n    -3.82755 0.0403628  -94.8286  <1e-99
prod: o    -3.89546 0.0416489  -93.5311  <1e-99
prod: p    -3.91643 0.0418437  -93.5966  <1e-99
prod: q    -3.88423 0.0414074  -93.8054  <1e-99
prod: r     -3.9031 0.0416133  -93.7944  <1e-99
prod: s    -3.85363 0.0407327  -94.6079  <1e-99
prod: t    -3.92431 0.0419838   -93.472  <1e-99
prod: u    -3.91551 0.0417962   -93.681  <1e-99
prod: v    -3.92217 0.0417068  -94.0415  <1e-99
prod: w    -3.90503  0.041674  -93.7043  <1e-99
prod: x    -3.81516 0.0402678  -94.7447  <1e-99
prod: y    -3.86918 0.0410894  -94.1648  <1e-99
prod: z    -3.83903 0.0404826  -94.8316  <1e-99
x1        0.0302483 0.0247737   1.22098  0.2221
x2       -0.0311121 0.0253477  -1.22741  0.2197
x3        0.0183217 0.0248309  0.737858  0.4606
x4: B     0.0104487 0.0235136  0.444368  0.6568
x4: C    -0.0170338 0.0236728 -0.719553  0.4718
x4: D    -0.0356445 0.0238845  -1.49237  0.1356
x4: E    -0.0303572  0.023757  -1.27782  0.2013

```

Notice that it took just over a minute (on a 4-core i5 processor desktop with 8 GB of memory).

Without fast=true it will take much longer but arrive at essentially the same answers.

It is not surprising that the estimated standard deviation of the random effects is 0. It represents the excess variability associated with the groups beyond that which is induced by the inherent variability of the random response. The fact that the coefficients for the levels of prod are significant is because there is no intercept in the model.