Search code examples
rsimulation

Simulate missing values with MNAR method in R


I simulated a data set with the following assumptions:

x1 <- rbinom(100,0,0.5) #trt

x2 <- rnorm(100,0,1) # metric outcome

df <- data.frame(x1,x2)

Now I'm trying to include missing values with two different methods: First "missing completely at random" and second "missing not at random". Therefore I tried lots of packages, but it does not work, as I expacted.

For the first scenario (MCAR) I used:

df_mcar <- ampute(data = df, prop = 0.1, mech = "MCAR", patterns = c(1, 0))$amp

... and it seems to work (with probability of 10% only x2 has missing values - independently of x1)

For the second scenario I want - again - that only x2 has missing values, but this time with special assumption on x1: Only for x1 = 1 I want x2 to have missing values in 10% of cases.

So in variable x2 I want missing values with probability of p=0.1 for x1 = 1 and with probability of p=0 for x1 = 0.

I would be glad for any hint or a simple solution :)

PS: I often read something like prodNA(...) but it does not work


Solution

  • Could probably do something like:

    library(dplyr)
    df %>%
      mutate(
        x2 = if_else(x1 == 1 & runif(n()) < .1, NA_real_, x2)
      )
    

    My R is currently too busy for me to run the code, though.