Search code examples
rdata-cleaning

Replacing "?" with the mean


I have a data set that has a column with many question marks "?", not NA. How can I replace that column with the mean of the numbers in that same column?


Solution

  • First convert the column into numeric column. ?s will be converted into NAs. Then calculate the mean of the remaining while excluding the NAs (i.e. na.rm=TRUE) and populate NAs positions with that mean

    df$coln <- as.numeric(df$coln)
    df$coln[is.na(df$coln)] <- mean(df$coln, na.rm=TRUE)