I am attempting to merge two rows of data into one row of aggregated data. Much of the values in the two rows are the same. Where they differ, I want to append data together.
I started by using add_row()
to create a new blank row that would serve as my target for aggregated data (with all NA
values). Then, I tried to write a call that would evaluate the content of the two rows in each field. If they were the same, it would populate that value into the added blank row. If they were not the same, they would concatenate the values together with a separator.
I was able to get the code to evaluate each value in a column against the other value and return the output I wanted, but it overwrote the existing values in the original two rows, and did not populate into the blank third row I created. This is where I got stuck, and I'm not sure how to direct the code to populate the aggregated value into the blank cell of the created row.
I'm reluctant to pivot my data to create a new column to populate, and then to pivot it back because I don't want to lose my ID field (and I'm also perpetually bad at pivoting) but that might be the solution, unless there's a more elegant way to do it.
Any advice is appreciated. Thanks!
EDIT: With apologies for forgetting, this is my desired output:
ID Email Gender Location Ice_Cream_Flavor
1 janedoe@email.com Female Los Angeles Strawberry
2 jessicadoe@email.com Female Los Angeles Rocky Road
NA NA Female Los Angeles Strawberry; Rocky Road
# Sample Data
library(tidyverse)
test<-data.frame("ID"=c("1","2"),
"Email"=c("janedoe@email.com","jessicadoe@email.com"),
"Gender"=c("Female","Female"),
"Location"=c("Los Angeles","Los Angeles"),
"Ice_Cream_Flavor"=c("Strawberry","Rocky Road"))
## Original Table
# A tibble: 2 × 5
ID Email Gender Location Ice_Cream_Flavor
<chr> <chr> <chr> <chr> <chr>
1 1 janedoe@email.com Female Los Angeles Strawberry
2 2 jessicadoe@email.com Female Los Angeles Rocky Road
## Attempted Code
merge<-test%>%
add_row()%>% # creating a new blank row to populate with aggregated data
mutate(across(.cols = -c(`ID`,
`Email`), # calling up all columns except these specific ones
~ case_when(!is.na(`ID`) & .[1]==.[2] ~ .[1], # Checking for equality in the values and returning the value in the first row
!is.na(`ID`) & .[1]!=.[2] ~ paste(.[1],.[2], sep = "; "), # appending unequal values together
TRUE~NA)))
## Output
# A tibble: 3 × 5
ID Email Gender Location Ice_Cream_Flavor
<chr> <chr> <chr> <chr> <chr>
1 1 janedoe@email.com Female Los Angeles Strawberry; Rocky Road
2 2 jessicadoe@email.com Female Los Angeles Strawberry; Rocky Road
3 NA NA NA NA NA
Without pivoting one potential option is:
library(tidyverse)
test<-data.frame("ID"=c("1","2"),
"Email"=c("janedoe@email.com","jessicadoe@email.com"),
"Gender"=c("Female","Female"),
"Location"=c("Los Angeles","Los Angeles"),
"Ice_Cream_Flavor"=c("Strawberry","Rocky Road"))
test %>%
mutate(across(everything(), ~ifelse(length(unique(.x)) > 1, paste(.x, collapse = ";"), .x))) %>%
distinct() %>%
bind_rows(test, .)
#> ID Email Gender Location Ice_Cream_Flavor
#> 1 1 janedoe@email.com Female Los Angeles Strawberry
#> 2 2 jessicadoe@email.com Female Los Angeles Rocky Road
#> 3 1;2 janedoe@email.com;jessicadoe@email.com Female Los Angeles Strawberry;Rocky Road
Created on 2024-12-17 with reprex v2.1.0
Is that your desired outcome with this example dataset? Or have I misunderstood?