Search code examples
rcorrelationmissing-data

cor_auto giving different results for missing = 'listwise' vs 'pairwise' for correlation with two variables


When calculating a polychoric correlation between two variables with missing values, cor_auto is providing different outputs with the missing argument set to 'listwise' compared to 'pairwise', for example:

library(qgraph)
set.seed(5)
df<-data.frame(lapply(1:2,function(x)sample(1:6,100,replace = T)),
stringsAsFactors = F)
colnames(df)=c("a", "b")

# make some missing values
df[10:20,2]<-NA

# these are different
cor_auto(df[,c("a", "b")], missing = "listwise")
cor_auto(df[,c("a", "b")], missing = "pairwise")

I expected that these should result in the same output when only two variables are included (only cases with both variables non-missing included). Does anyone know how this difference comes about?


Solution

  • The underlying function here is lavaan::lavCor which also estimates thresholds in addition to the polychoric correlation. By setting missing = "listwise", the thresholds of variable a are estimated using only the rows that have complete data, and so are different than the thresholds estimated with missing = "pairwise". This leads to the discrepancy.