Error in !is.na(v.e) && v.e > 0 when using lme4 for mixed-effects model in R

I did not find anything similar to my issue.

I have single cell expression data from 20 different experiments (29733genes x 24489 cells). We want to see if there are differences in gene expression between levels in a factor cell_types (4 levels) by each level of another variable that has 3 levels (for discretion lets call it factor2).

The objective is to obtain differentially expressed genes when testing these contrasts:

contrast_matrix <- makeContrasts(
  "Cell_typeAvsOther_Factor2:level1" = Cell_typeA_Factor2_level1 - (Cell_typeB_Factor2_level1 + Cell_typeC_Factor2_level1 + Cell_typeD_Factor2_level1) / 3,
  "Cell_typeAvsOther_Factor2:level2" = Cell_typeA_Factor2_level2 - (Cell_typeB_Factor2_level2 + Cell_typeC_Factor2_level2 + Cell_typeD_Factor2_level2) / 3,
  "Cell_typeAvsOther_Factor2:level3" = Cell_typeA_Factor2_level3 - (Cell_typeB_Factor2_level3 + Cell_typeC_Factor2_level3 + Cell_typeD_Factor2_level3) / 3, 
  
  "Cell_typeBvsOther_Factor2:level1" = Cell_typeB_Factor2_level1 - (Cell_typeA_Factor2_level1 + Cell_typeC_Factor2_level1 + Cell_typeD_Factor2_level1) / 3,
  "Cell_typeBvsOther_Factor2:level2" = Cell_typeB_Factor2_level2 - (Cell_typeA_Factor2_level2 + Cell_typeC_Factor2_level2 + Cell_typeD_Factor2_level2) / 3,
  "Cell_typeBvsOther_Factor2:level3" = Cell_typeB_Factor2_level3 - (Cell_typeA_Factor2_level3 + Cell_typeC_Factor2_level3 + Cell_typeD_Factor2_level3) / 3, 
  
  "Cell_typeCvsOther_Factor2:level1" = Cell_typeC_Factor2_level1 - (Cell_typeB_Factor2_level1 + Cell_typeA_Factor2_level1 + Cell_typeD_Factor2_level1) / 3,
  "Cell_typeCvsOther_Factor2:level2" = Cell_typeC_Factor2_level2 - (Cell_typeB_Factor2_level2 + Cell_typeA_Factor2_level2 + Cell_typeD_Factor2_level2) / 3,
  "Cell_typeCvsOther_Factor2:level3" = Cell_typeC_Factor2_level3 - (Cell_typeB_Factor2_level3 + Cell_typeA_Factor2_level3 + Cell_typeD_Factor2_level3) / 3,
  
  "Cell_typeDvsOther_Factor2:level1" = Cell_typeD_Factor2_level1 - (Cell_typeB_Factor2_level1 + Cell_typeA_Factor2_level1 + Cell_typeC_Factor2_level1) / 3,
  "Cell_typeDvsOther_Factor2:level2" = Cell_typeD_Factor2_level2 - (Cell_typeB_Factor2_level2 + Cell_typeA_Factor2_level2 + Cell_typeC_Factor2_level2) / 3,
  "Cell_typeDvsOther_Factor2:level3" = Cell_typeD_Factor2_level3 - (Cell_typeB_Factor2_level3 + Cell_typeA_Factor2_level3 + Cell_typeC_Factor2_level3) / 3, 
  levels = design_matrix
)

To simplify my model I combined cell type and factor2 into a unified factor.

Then we have batch factor. Here I show you the distribution of cell types across batches. As you can see some batches are celltype specific:

           celltypeA celltypeB celltypeC celltypeD
  batch_1          0         0       441         0
  batch_10         0        63         3        58
  batch_11       181        13         7        11
  batch_12         2        36         2      1144
  batch_13         0        16         2       876
  batch_14        58       265      3226       469
  batch_15        19        66       115       858
  batch_16         0        73       100      1996
  batch_17         0        29       783         1
  batch_18         2        89       192       884
  batch_19      1152       459        33        71
  batch_2        193         3         0         0
  batch_20         0      2219         2         2
  batch_3        198        63         0         0
  batch_4          8         3         3       208
  batch_5          8        12       499         2
  batch_6       1878       465         8         1
  batch_7       3642        72       293        46
  batch_8          0       206         0         1
  batch_9        602        52         2         3

Using a fix effects model with a design matrix such as :

design_matrix <- model.matrix(
  ~ 0 + Celltype_factor2 + Batch ,
  data = cellinfo,
  contrasts.arg = lapply(cellinfo[, sapply(cellinfo, is.factor), drop = FALSE],
                         contrasts, contrasts = FALSE))

Was giving me a design matrix that was not full rank so it was making me remove one of the coefficients, which I feel is not appropiate.

Additionally, I kept reading many questions and threads about this and I arrived to the conclusion that batch should be a random effect. Mainly because I do not want to just analyze the differences in expression for these specific batches. I tried to implement the following mixed-effects model using lme4 package(1.1-34)in R (R version 4.3.1 (2023-06-16 ucrt)).

model <- lmer(gene_expression ~ Celltype_factor2 + (1 | Batch), data=cellinfo)

But I am getting the following error (note: I ran it in a subset of my data. I tried subsetting only cells, subsetting both genes and cells. I checked for NAs, all the levels were also included, etc...):

Error in !is.na(v.e) && v.e > 0 : 
  'length = 90000' in coercion to 'logical(1)'

I did not find anybody having this issue. I would appreciate some insight, either model-wise or error-wise.

Solution

This really ought to have a more informative error message (it does now in the development version), but the problem is that (unlike lm()), lmer can't handle matrix-valued response variables. Example:

dd <- data.frame(x = rnorm(1000), batch = factor(rep(1:20, each=50)))
dd$y <- matrix(rnorm(1e4), ncol = 10)
lmer(y ~ x + (1|batch), dd)

Error in !is.na(v.e) && v.e > 0 : 'length = 100' in coercion to 'logical(1)'

The proper way to handle this case is by using refit:

ans <- vector("list", length = ncol(dd$y))
ans[[1]] <- lmer(y[,1] ~ x + (1|batch), dd)
for (i in 2:ncol(dd$y)) {
   ans[[i]] <- refit(ans[[1]], newresp = dd$y[,i])
}

(The example in ?refit uses lapply(), but I think this might be clearer ...)