Search code examples
rfor-loopapplydata-cleaningdplyr

How do I loop through columns in R?


I am trying to clean up my code for cleaning missing data. I have a dataset with 6 columns and the code works if I were to do them individually like this:

mammographic_masses <- mammographic_masses %>%
  mutate(birad = replace(birad, birad== "na", NA)) %>%
  mutate(birad = replace(birad, birad== "N/A", NA))

But when I try to do it in a for loop like this:

for (i in ncol(mammographic_masses)){
  print(class(mammographic_masses[[i]]))
  mammographic_masses <- mammographic_masses %>%
    mutate(mammographic_masses[[,i]] = replace(mammographic_masses[[,i]], mammographic_masses[[,i]] == "na", NA)) %>%
    mutate(mammographic_masses[[,i]] = replace(mammographic_masses[[,i]], mammographic_masses[[,i]] == "N/A", NA))
}

I get an error:

Error: unexpected '=' in:
"  mammographic_masses <- mammographic_masses %>%
    mutate(mammographic_masses[[,i]] ="
>     mutate(mammographic_masses[[,i]] = replace(mammographic_masses[[,i]], mammographic_masses[[,i]] == "N/A", NA))
Error: unexpected '=' in "    mutate(mammographic_masses[[,i]] ="
> }
Error: unexpected '}' in "}"

I also was reading up on other ways such as apply etc. but i cant figure a way to loop it per column


Solution

  • Instead of looping, use mutate_all.

    library(dplyr)
    
    mammographic_masses %>%
      mutate_all(function(x) {is.na(x) <- x %in% c("na", "N/A"); x})
    #     V1   V2   V3   V4
    #1     d    b <NA>    c
    #2     d    b <NA> <NA>
    #3  <NA> <NA>    d    b
    #4     a <NA> <NA>    b
    #5     a    b    d <NA>
    #6     d    c    b    c
    #7     b    b    d <NA>
    #8  <NA> <NA> <NA>    d
    #9     a    d    d <NA>
    #10 <NA>    b    d <NA>
    

    Test data creation code

    set.seed(2020)
    n <- 10
    mammographic_masses <- replicate(4, sample(c(letters[1:4], "na", "N/A"), n, TRUE))
    mammographic_masses <- as.data.frame(mammographic_masses)