Search code examples
rvariablesdplyrtidyversedata-cleaning

How to row sum with factors in R?


I have data like so:

df <- data.frame(id = c("001", "002", "003", "004"),
                 banana = c("banana", NA, NA, NA),
                 apple = c(NA, "apple", "apple", NA),
                 orange = c("orange", NA, NA, NA),
                 mango = c(NA, NA, NA, "mango"))

 id banana apple orange mango
001 banana    NA orange    NA
002     NA apple     NA    NA
003     NA apple     NA    NA
004     NA    NA     NA mango

I'd like to consolidate the variables into a variable flavor in the following manner:

 id banana apple orange mango flavor
001 banana    NA orange    NA  mixed
002     NA apple     NA    NA  apple
003     NA apple     NA    NA  apple
004     NA    NA     NA mango  mango

How do I do this? Any help is appreciated, although a generalisable dplyr solution would be great.

Edit: I need a solution robust to having other columns in my dataframe too.


Solution

  • As you have requested, here's a dplyr solution.

    library(dplyr)
    
    df %>% 
      pivot_longer(-id) %>% 
      mutate(flavor = ifelse(sum(is.na(value)) == ncol(df) - 2, value[!is.na(value)], "mixed"), .by = id) %>% 
      pivot_wider() %>% 
      relocate(-flavor) # move the flavor column to the last
    
    # A tibble: 4 × 6
      id    banana apple orange mango flavor
      <chr> <chr>  <chr> <chr>  <chr> <chr> 
    1 001   banana NA    orange NA    mixed 
    2 002   NA     apple NA     NA    apple 
    3 003   NA     apple NA     NA    apple 
    4 004   NA     NA    NA     mango mango