Search code examples
rdata.tableimputationr-mice

R Impute DataTable Mode And Mice


# IMPUTING VALUES
library(data.table)
set.seed(1337)
mydt = q <- data.table(Year = rep(2000:2005, each = 10),
                   Type = c("A","B"),
Class = sample(1:5,rep=T),
                   Car = sample(0:1, rep=T),
                   Boat = sample(1:4, rep=T)
)
naRows <- sample(nrow(mydt),15)
mydt[ naRows, Car := NA]
naRows <- sample(nrow(mydt),15)
mydt[ naRows, Boat := NA]
setkey(mydt,Year,Type)

All my data is category and binary.

I wish to do two items.

First I hope to impute the mode of Car and Boat by Type and Class. So for every combination of Type and Class find the Mode of Car and Boat and impute those.

Secondly I wonder:: is it possible to use 'mice' to do this too?

I am seeking data.table and mice solution. I wish for both because 'mice' may take very long in my big data!


Solution

  • If I understood your request correctly (you want to replace the missings with the mode of the column by Type x Class), this could be a data.table solution:

    # function to calculate mode
    stats_mode <- function(x) {
      ux <- unique(x[!is.na(x)])
      ux[which.max(tabulate(match(x, ux)))]
    }
    
    # Generate new column with mode per group
    mydt[, `:=`(mCar  = stats_mode(Car),
                mBoat = stats_mode(Boat)), by = .(Type, Class)]
    
    # Replace missings
    mydt[is.na(Car),  Car  := mCar]
    mydt[is.na(Boat), Boat := mBoat]
    
    # Cleansing
    mydt[, c("mBoat", "mCar") := NULL]
    

    With big data you probably want to avoid the materialization of the two columns entailing the mode. Instead you could store a summary table and use this as a kind of lookup table to find the value of the mode per group.