Search code examples
rdplyrtidyrtidyeval

using replace_na() with indeterminate number of columns


My data frame looks like this:

df <- tibble(x = c(1, 2, NA),
             y = c(1, NA, 3),
             z = c(NA, 2, 3))

I want to replace NA with 0 using tidyr::replace_na(). As this function's documentation makes clear, it's straightforward to do this once you know which columns you want to perform the operation on.

df <- df %>% replace_na(list(x = 0, y = 0, z = 0))

But what if you have an indeterminate number of columns? (I say 'indeterminate' because I'm trying to create a function that does this on the fly using dplyr tools.) If I'm not mistaken, the base R equivalent to what I'm trying to achieve using the aforementioned tools is:

df[, 1:ncol(df)][is.na(df[, 1:ncol(df)])] <- 0

But I always struggle to get my head around this code. Thanks in advance for your help.


Solution

  • We can do this by creating a list of 0's based on the number of columns of dataset and set the names with the column names

    library(tidyverse)
    df %>% 
       replace_na(set_names(as.list(rep(0, length(.))), names(.)))
    # A tibble: 3 x 3
    #      x     y     z
    #   <dbl> <dbl> <dbl>
    #1     1     1     0
    #2     2     0     2
    #3     0     3     3
    

    Or another option is mutate_all (for selected columns -mutate_at or base don conditions mutate_if) and applyreplace_all

    df %>%
        mutate_all(replace_na, replace = 0)
    

    With base R, it is more straightforward

    df[is.na(df)] <- 0