I'm currently dealing with a data set that has missing values, but they are only missing for one single variable. I was trying to determine whether they are missing at random, so that I can simply remove them from the data frame. Hence, I am trying to find potential correlations between the NA's in the data frame and the values of the other variables. I found the following code online:
library("VIM")
data(sleep)
x <- as.data.frame(abs(is.na(sleep)))
head(sleep)
head(x)
y <- x[which(sapply(x, sd) > 0)]
cor(y)
However, this only shows you how the missing values themselves are correlated, in case there are distributed across all variables.
Is there a way to find not the correlation between the missing values in a data frame, but the correlation between the missing values of one variable and values of another variable? For example, if you have a survey which is optionally asking for family income, how could you determine whether the missing values are e.g. correlated with low income with R?
library(finalfit)
library(dplyr)
df <- data.frame(
A = c(1,2,4,5),
B = c(55,44,3,6),
C = c(NA, 4, NA, 5)
)
df %>%
missing_pairs("A", "C")