Search code examples
rnadata-cleaningsquare-bracket

Why two single square brackets side by side in R


I am trying to learn data-cleaning with simple code.

My central question is: what is the use of two single square brackets side by side?

Here is df as an example.

df <- data.frame(x = c(1:3, NA, NA), y = c(6:9, NA))

The following code is one of the many ways to replace NAs with, say, 99. And I think it's quite simple.

messy <- function(df, impute){
for (i in 1:nrow(df)) {
df[i, ][is.na(df[i, ])] <- impute
}
return(df)
}
clean <- messy(df, 99)
clean
  1. But why do I need to use two simple square brackets to locate the NAs.
  2. Why isn't it possible to simplify the code to be is.na(df[i, ]) <- impute ?
  3. Is there any more efficient ways to replace NAs, such as using the apply family?

Many thanks for answering.


Solution

  • That is a very complex way of replacing NA's. You can reduce the function to -

    messy <- function(df, impute){
      df[is.na(df)] <- impute
      df
    }
    
    clean <- messy(df, 99)
    clean
    
    #   x  y
    #1  1  6
    #2  2  7
    #3  3  8
    #4 99  9
    #5 99 99
    

    You can use apply family of functions as well but they are not needed here since is.na works on dataframes directly.