Search code examples
rdata-analysisnadata-cleaning

Find and replace values by NA for all columns in DataFrame


Age <- c(90,56,51,'NULL',67,'NULL',51)
Sex <- c('Male','Female','NULL','male','NULL','Female','Male')
Tenure <- c(2,'NULL',3,4,3,3,4)
df <- data.frame(Age, Sex, Tenure)

In the above example, there are 'NULL' values as character/string formate. I am trying to impute NA in place of 'NULL' values. I was able to it for a single column as df$age[which(df$Age=='NULL)]<-NA' However I don't want to write this for all columns.

How to apply similar logic to all columns so that all the 'NULL' values of df are converted to NAs? I am guessing that apply or custom defined function or for loop will do it.


Solution

  • We can use dplyr to replace the 'NULL' values in all the columns and then convert the type of the columns with type.convert. Currently, all the columns are factor class (assuming that 'Age/Tenure' should be numeric/integer class)

    library(dplyr)
    res <- df %>%
             mutate_all(funs(type.convert(as.character(replace(., .=='NULL', NA)))))
    str(res)
    #'data.frame':   7 obs. of  3 variables:
    #$ Age   : int  90 56 51 NA 67 NA 51
    #$ Sex   : Factor w/ 3 levels "Female","male",..: 3 1 NA 2 NA 1 3
    #$ Tenure: int  2 NA 3 4 3 3 4