missing values,classification task

I am using this dataset breastcancer from UCI but it contains missing values. Can anyone help me to fix it? I am new to ML and I don't know a lot about missing values techniques. Here is the link for dataset cancerdata.

I tried this code on R :

data <- read.csv('D:/cancer.csv', header=FALSE)  # Reading the data 

for(i in 1:ncol(data)) {
    data[is.na(data[,i]), i] <- mean(data[,i], na.rm=TRUE)
}

but it gives me an error (sorry it may be trivial but I am really pretty new here is a screenshot of the

thank you for your time and consideration

here is the output I have

Solution

Try the missForest package in R: https://cran.r-project.org/web/packages/missForest/missForest.pdf

It is really easy to use, fast and does a great job imputing categorical and numeric values.

For a quick tutorial, see here: https://www.analyticsvidhya.com/blog/2016/03/tutorial-powerful-packages-imputing-missing-values/

Edit: You have total 16 missing values in the data, all in column 7 (V7). You can check this by

data <- read.csv('D:/cancer.csv', header=FALSE)  # Reading the data
sum(data == "?")
sum(data$V7 == "?")

Now, missForest will impute all in NAs in data, no matter where they are. If you want to retain some NAs, separate that data first.

To impute all NAs:

data[data == "?"] <- NA
library(missForest)
data <- missForest(data)$ximp

Now all the NAs have been imputed and replaced with some meaningful values. To verify this:

sum(is.na(data))

Use this data with imputed values.