Search code examples
rif-statementdplyrends-with

r dplyr combining mutate_at, vars(ends_with), ifelse, !is.na


Hi I have 10 variables with the same ending and I am trying to use mutate_at to create a new variable based off of data in those variables and assign it back to the dataframe. If any of the variables with the ending "xyz" has data (i.e. is not NA) then I would like to assign the count of values, otherwise a value of NA.

df %<>% mutate_at(vars(ends_with("xyz")), funs(new_var = ifelse(!is.na(), 1, NA)))

The above code gives an error requiring an argument for !is.na() but the vars argument requires a function. How do I combine this?

Edit: Here is the reproducible example and desired output:

`# A tibble: 6 x 6
       1_abc    1_xyz     2_abc      2_xyz     3_abc   3_xyz
1       NA        1          NA          1        NA      NA
2       NA       NA          NA         NA        NA      NA 
3       NA       NA          NA          1        NA      NA
4       NA       NA          NA         NA        NA      NA
5       NA       NA          NA         NA        NA      NA 
6       NA        1          NA         NA        NA      NA`

The desired output would be a variable such as xyz_num where values would be NA if all _xyz vars are NA or the count of non-null variables if any of the _xyz vars are not NA.

`# A tibble: 6 x 7
       1_abc    1_xyz     2_abc      2_xyz     3_abc   3_xyz   xyz_num
1       NA        1          NA          1        NA      NA         2      
2       NA       NA          NA         NA        NA      NA        NA
3       NA       NA          NA          1        NA      NA         1
4       NA       NA          NA         NA        NA      NA        NA
5       NA       NA          NA         NA        NA      NA        NA
6       NA        1          NA         NA        NA      NA         1`

Solution

  • with dplyr, you can try something like

    df1 %>%
      select(ends_with("_xyz")) %>%
      mutate(nnums = rowSums(!is.na(.)))
    

    assuming input is

    structure(list(X1_abc = c(NA, NA, NA, NA, NA, NA), X1_xyz = c(1, 
    NA, NA, NA, NA, 1), X2_abc = c(NA, NA, NA, NA, NA, NA), X2_xyz = c(1, 
    NA, 1, NA, NA, NA), X3_abc = c(NA, NA, NA, NA, NA, NA), X3_xyz = c(NA, 
    NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
    -6L))
    

    it returns

      X1_xyz X2_xyz X3_xyz nnums
    1      1      1     NA     2
    2     NA     NA     NA     0
    3     NA      1     NA     1
    4     NA     NA     NA     0
    5     NA     NA     NA     0
    6      1     NA     NA     1
    

    i hope you can modify around the code to keep the columns you want.

    EDIT 1:

    to keep all columns, try

    df1 %<>%
      mutate(nnums = rowSums(!is.na(select(df1, ends_with("_xyz")))))