Search code examples
rdplyrimputation

impute missing value by condition with dplyr


I want to replace the missing value with mean value within same sex.

For example, if 'patient A - male' has missing value in pain, the missing value will be replace with mean value of pain in male.

rawdata <- rawdata %>%
  mutate(replace_pain = ifelse(is.na(pain) & sex == "male",
                               rawdata %>% 
                                 filter(sex == "male") %>% 
                                 mean(pain, na.rm = TRUE),
                               ifelse(is.na(pain) & sex == "female",
                                      rawdata %>% 
                                        filter(sex == "female") %>% 
                                        mean(pain, na.rm = TRUE),
                                      pain)))

It has two problems.

1) Coding is a little messy.

2) It doesn't working. The error appears. Maybe, it seems there is a problem with %>%mean code.

Warning message:
In mean.default(., pain, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

Is there better way to impute the missing value with condition?


Solution

  • Your code is not working because you have to add summarise(mean(pain, na.rm = TRUE)) not only mean(pain, na.rm = TRUE). You cannot use mean on a dataframe.

    rawdata %>%
      mutate(replace_pain= ifelse(is.na(pain) & sex=="male",
                                  rawdata %>% filter(sex=="male") %>% summarise(mean(pain,na.rm=TRUE)),
                                  ifelse(is.na(pain) & sex=="female",
                                         rawdata %>% filter(sex=="female") %>% summarise(mean(pain,na.rm=TRUE)),
                                         pain)))
    

    The code is still quite messy, it would be probably be nicer to define a avg_pain_female and avg_pain_male variable first.