Search code examples
rrowmultiple-columnsbind

R combine rows where certain value overrules other values


I have a dataframe as below. I want to combine rows based on duplicates in column person. However, is it possible for specified columns (Beer, Cola, Wodka in this case) that a certain value (1 in this case) overrules other values (0 in this case).

Current dataframe:

person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M","M","W", "W", "W")
Beer <- c(1,1,1,1,0)
Cola <- c(0,1,0,0,0)
Wodka <- c(0,1,0,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)

Outcome should be:

person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)

Thanks.


Solution

  • Using dplyr, you can summarise() to get one row per person, and take the maximum of the specified columns:

    library(tidyverse)
    
    person <- c("John", "John", "Alex", "Nicole", "Nicole")
    Sex <- c("M", "M", "W", "W", "W")
    Beer <- c(1, 1, 1, 1, 0)
    Cola <- c(0, 1, 0, 0, 0)
    Wodka <- c(0, 1, 0, 0, 1)
    
    df <- data.frame(person, Sex, Beer, Cola, Wodka)
    
    df %>% 
      group_by(person, Sex) %>% 
      summarise(across(c(Beer, Cola, Wodka), max))
    #> `summarise()` regrouping output by 'person' (override with `.groups` argument)
    #> # A tibble: 3 x 5
    #> # Groups:   person [3]
    #>   person Sex    Beer  Cola Wodka
    #>   <chr>  <chr> <dbl> <dbl> <dbl>
    #> 1 Alex   W         1     0     0
    #> 2 John   M         1     1     1
    #> 3 Nicole W         1     0     1