I have a dataframe as below. I want to combine rows based on duplicates in column person. However, is it possible for specified columns (Beer, Cola, Wodka in this case) that a certain value (1 in this case) overrules other values (0 in this case).
Current dataframe:
person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M","M","W", "W", "W")
Beer <- c(1,1,1,1,0)
Cola <- c(0,1,0,0,0)
Wodka <- c(0,1,0,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
Outcome should be:
person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
Thanks.
Using dplyr, you can summarise()
to get one row per person, and take
the maximum of the specified columns:
library(tidyverse)
person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M", "M", "W", "W", "W")
Beer <- c(1, 1, 1, 1, 0)
Cola <- c(0, 1, 0, 0, 0)
Wodka <- c(0, 1, 0, 0, 1)
df <- data.frame(person, Sex, Beer, Cola, Wodka)
df %>%
group_by(person, Sex) %>%
summarise(across(c(Beer, Cola, Wodka), max))
#> `summarise()` regrouping output by 'person' (override with `.groups` argument)
#> # A tibble: 3 x 5
#> # Groups: person [3]
#> person Sex Beer Cola Wodka
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Alex W 1 0 0
#> 2 John M 1 1 1
#> 3 Nicole W 1 0 1