Resolving 'qr and y must have the same number of rows' when the dimensions are the same

Core question: what debugging can be done for "'qr' and 'y' must have the same number of rows' I'm running an RDA, which I've done dozens of times before. The only difference this time is I'm using different data than before, naturally.

Tried and true code line:

rda_20<-rda(bio_data,abio_data)

I'm getting the error code

Error in qr.fitted(Q, Y): 'qr' and 'y' must have the same number of rows

I checked dimensions of my source data: bio: 9865,259; abio: 9865,6. Running a type check, all data are numeric.

The only two remaining sources of this error I can see would be: There are a number of NAs in the abiotic data, but this has never been an issue for me before as na.action's default handles missing data. I'd also note that there are some columns with the sum of 0 (left in for comparing between multiple datasets). If that could cause the issue, can I somehow subset by colSums != 0 to trim those columns quickly? If not, where else can I look to track down the source of this error?

update: I've removed colSums = 0. No effect. update 2: removing NA's had no effect

Solution

If you don't use the formula interface na.action doesn't apply, and you aren't using the formula interface.

Although I don't recommend it, the direct formula-based alternative to the code you showed is:

rda(bio_data ~ ., data = abio_data, na.action = na.exclude)

(However, in testing even that doesn't seem to be excluding the all the NA if they exist in both data frames.)

I don't recommend people do this as using . is lazy and promotes poor statistical practice with constrained ordinations. Instead one should explicitly specify the terms wanted in the model on the right hand side of the formula.

If you wish to continue with the default interface passing two matrices, you need can use complete.cases() on both data frames to get the vectors indicating which rows have no missing data. The take the union of those logical vector to get the common set of rows with no missing values, and use the union to subset both data frames to select only the common set of non-missing data:

set.seed(1)
df1 <- matrix(rpois(10*20, lambda = 5), ncol = 10)
df2 <- matrix(rlnorm(5*20), ncol = 5)
df1[sample(10*20, 5)] <- NA
df2[sample(5*20, 5)] <- NA
df1 <- as.data.frame(df1)
df2 <- as.data.frame(df2)

c1 <- complete.cases(df1)
c2 <- complete.cases(df2)
c12 <- c1 & c2

df1_sub <- df1[c12, ]
df2_sub <- df2[c12, ]

library('vegan')
ord <- rda(df1_sub, df2_sub)