Search code examples
rtidyr

Error with replace_na single column. `replace` must be a list, not a number


I want to replace all NA in one column with 0.

Here is my MWE:

df = structure(list(stage = c("CKD12", "CKD12", "CKD12", "CKD12", 
"CKD3a", "CKD3a"), smokes = c("Current", "Ex-smoker", "Never", 
"Unknown", "Current", "Ex-smoker"), n = c(3, 4, 11, 0, NA, 6)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

I can do this with

df$n= tidyr::replace_na(df$n,0)

But i want to do this in a series of piped expressions, with this attempt:

df%>%replace_na(n,0)

But get this error:

Error in `replace_na()`:
! `replace` must be a list, not a function.
Run `rlang::last_trace()` to see where the error occurred.
Error in `replace_na()`:
! Arguments in `...` must be used.
x Problematic argument:
* ..1 = 0
i Did you misspell an argument name?

What am I doing wrong with this function?


Solution

  • Read the docs (said in a "Star Wars" tone ...).

    Looking at ?replace_na, the relevant arguments are:

        data: A data frame or vector.
    
     replace: If ‘data’ is a data frame, ‘replace’ takes a named list of
              values, with one value for each column that has missing
              values to be replaced. Each value in ‘replace’ will be cast
              to the type of the column in ‘data’ that it being used as a
              replacement in.
    
              If ‘data’ is a vector, ‘replace’ takes a single value. This
              single value replaces all of the missing values in the
              vector. ‘replace’ will be cast to the type of ‘data’.
    

    In your first call replace_na(df$n,0), the first argument is a vector (df$n), for which replace must be a single value (scalar).

    However, in your second expression df%>%replace_na(n,0), the first data argument is a frame, so replace must be a named list where the names indicate which columns to look for and replace NA values. We will use list(n=0) since you want NA values to be replaced with 0 in the column named n.

    df %>%
      replace_na(list(n = 0))
    # # A tibble: 6 × 3
    #   stage smokes        n
    #   <chr> <chr>     <dbl>
    # 1 CKD12 Current       3
    # 2 CKD12 Ex-smoker     4
    # 3 CKD12 Never        11
    # 4 CKD12 Unknown       0
    # 5 CKD3a Current       0
    # 6 CKD3a Ex-smoker     6