Search code examples
rdataframedplyrcountmutate

How do I add a new column through a conditional mutate but preserve the original dataframe?


I have a large dataframe (df) containing 500+ rows, 50+ columns/variables but only want to target specific variables.

targ_vars <- c("all3a1", "3a1_arc",
              "all3b1", "3b1_arc",
              "all3c1", "3c1_arc")

The vector above contains the variables which have frequency data (i.e. multiple rows with 1,2,3 etc.)

I want to add a new count column in the original large dataframe (df) which contains the row sum of any non-NA value specifically for those select variables in "targ_vars".

Again, I'm not trying to add the value of the actual frequency data across each of those variables, but moreso just a sum of any non-NA value per row (i.e. 1,2,NA,7,NA,1 = total row count of 4 non-NA).

I've gotten as far as this:

df <- df %>%
        select(targ_vars) %>%
        mutate(targ_var_count = rowSums(!is.na(.), na.rm = TRUE))

The problem is I'm not sure how to "deselect" the variables I used to run the mutate calculation. The line above would result in overwriting the entire original dataframe (df) containing 50+ columns/vars, and placing back only the selected 6 variables in (targ_vars) plus the new (targ_var_count) variable that mutate calculated.

Essentially, I just want to focus on that last mutate line, and plop that new count column back into the original (df).

I tried something like the one below but it ended up giving me a list when I call "df$targcount" instead of just 1 rowSum column:

df$targcount <- df %>%
    select(targ_vars) %>%
    mutate(targcount = rowSums(!is.na(.), na.rm = TRUE))

Any help/tips would be appreciated.


Solution

  • You could use dplyr::across to get the count of non NA values for just your targ_vars columns.

    Using some fake random example data:

    set.seed(123)
    
    dat <- data.frame(
      a = sample(c(0, NA), 10, replace = TRUE),
      b = sample(c(0, NA), 10, replace = TRUE),
      c = sample(c(0, NA), 10, replace = TRUE),
      d = sample(c(0, NA), 10, replace = TRUE)
    )
    
    targ_vars <- c("c", "d")
    
    library(dplyr, w = FALSE)
    
    dat %>%
      mutate(targcount = rowSums(across(all_of(targ_vars), ~ !is.na(.x))))
    #>     a  b  c  d targcount
    #> 1   0 NA  0  0         2
    #> 2   0 NA NA NA         0
    #> 3   0 NA  0  0         2
    #> 4  NA  0  0 NA         1
    #> 5   0 NA  0 NA         1
    #> 6  NA  0  0  0         2
    #> 7  NA NA NA  0         1
    #> 8  NA  0 NA  0         1
    #> 9   0  0  0  0         2
    #> 10  0  0 NA NA         0