I have a dataframe where each row represents a person, and the columns represent the names they have gone by. Some of the values are NAs or are duplicates. The data looks like the dataframe immediately below.
Name | Name1 | Name2 | Name3 | Name4 |
---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy |
Jim | NA | James | NA | Jimmy |
Dave | Dave | David | NA | Davey |
Tim | NA | Timothy | Tim | Timmy |
Rob | Rob | NA | Rob | Robby |
Sam | NA | NA | Sam | NA |
I would like to combine the unique names from each row and put them into a new column, where each name only occurs once. I know i can use the paste function to produce a column where all the text values appear like this:
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | NA | Tommy | Tom, Tom, Thomas, NA, Tommy |
but I don't want the same text to appear multiple times in the unique column. How can I combine the row data such that each name only occurs once in the new $unique cell?
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy | Tom, Thomas, Tommy |
Jim | NA | James | NA | Jimmy | Jim, James, Jimmy |
Dave | Dave | David | NA | Davey | Dave, David, Davey |
Tim | NA | Timothy | Tim | Timmy | Tim, Timothy, Timmy |
Rob | Rob | NA | Rob | Robby | Rob, Robert, Robby |
Sam | NA | NA | Sam | NA | Sam |
Using tidyverse
library(dplyr)
df1 %>%
rowwise %>%
mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>%
ungroup
-output
# A tibble: 6 × 6
Name Name1 Name2 Name3 Name4 unique
<chr> <chr> <chr> <chr> <chr> <chr>
1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 Dave Dave David <NA> Davey Dave, David, Davey
4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 Rob Rob <NA> Rob Robby Rob, Robby
6 Sam <NA> <NA> Sam <NA> Sam