When calculating a polychoric correlation between two variables with missing values, cor_auto
is providing different outputs with the missing argument set to 'listwise' compared to 'pairwise', for example:
library(qgraph)
set.seed(5)
df<-data.frame(lapply(1:2,function(x)sample(1:6,100,replace = T)),
stringsAsFactors = F)
colnames(df)=c("a", "b")
# make some missing values
df[10:20,2]<-NA
# these are different
cor_auto(df[,c("a", "b")], missing = "listwise")
cor_auto(df[,c("a", "b")], missing = "pairwise")
I expected that these should result in the same output when only two variables are included (only cases with both variables non-missing included). Does anyone know how this difference comes about?
The underlying function here is lavaan::lavCor
which also estimates thresholds in addition to the polychoric correlation. By setting missing = "listwise"
, the thresholds of variable a
are estimated using only the rows that have complete data, and so are different than the thresholds estimated with missing = "pairwise"
. This leads to the discrepancy.