Search code examples
rloopshmiscimputation

looping over each column to impute data in R but does not replace imputed data


I am trying to impute the dataframe with Hmisc impute model. I am able to impute the data for one column at a time but fail to loop over columns.

Below example - works fine but I would like to make it dynamic using a function:

impute_marks$col1 <- with(impute_marks, round(impute(col1, mean)),0)

Example:

impute_dataframe <- function()
{
  for(i in 1:ncol(impute_marks))
  {
    impute_marks[is.na(impute_marks[,i]), i] <- with(impute_marks, round(impute(impute_marks[,i], mean)),0)
  }
}
impute_dataframe 

There is no error when I run the function but there is no imputed data as well to the dataset impute_marks.


Solution

  • Hmisc::impute is already a function, why not just use apply and save a for loop?:

    library(Hmisc)
    age1 <- c(1,2,NA,4)
    age2 <- c(NA, 4, 3, 1)
    mydf <- data.frame(age1, age2)
    
    mydf
      age1 age2
    1    1   NA
    2    2    4
    3   NA    3
    4    4    1
    
    apply(mydf, 2, function(x) {round(impute(x, mean))})
      age1 age2
    1    1    3
    2    2    4
    3    2    3
    4    4    1
    

    EDIT: To keep mydf as a data.frame you could coherce it back like this:

    mydf <- as.data.frame(mydf)
    

    But what I'd do is use another package purrr which is nice set of tools around this apply/mapping idea. map_df for example will always return a data.frame object, there are a bunch of map_x that you can see with ?map

    library(purrr)
    map_df(mydf, ~ round(impute(., mean)))
    

    I know it is preferred to use the base R functions, but purrr makes apply style operations so much easier.