Search code examples
rdataframetextmergeunique

How can I paste text from dataframe rows, keeping only unique values in R


I have a dataframe where each row represents a person, and the columns represent the names they have gone by. Some of the values are NAs or are duplicates. The data looks like the dataframe immediately below.

Name Name1 Name2 Name3 Name4
Tom Tom Thomas Tom Tommy
Jim NA James NA Jimmy
Dave Dave David NA Davey
Tim NA Timothy Tim Timmy
Rob Rob NA Rob Robby
Sam NA NA Sam NA

I would like to combine the unique names from each row and put them into a new column, where each name only occurs once. I know i can use the paste function to produce a column where all the text values appear like this:

Name Name1 Name2 Name3 Name4 unique
Tom Tom Thomas NA Tommy Tom, Tom, Thomas, NA, Tommy

but I don't want the same text to appear multiple times in the unique column. How can I combine the row data such that each name only occurs once in the new $unique cell?

Name Name1 Name2 Name3 Name4 unique
Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
Jim NA James NA Jimmy Jim, James, Jimmy
Dave Dave David NA Davey Dave, David, Davey
Tim NA Timothy Tim Timmy Tim, Timothy, Timmy
Rob Rob NA Rob Robby Rob, Robert, Robby
Sam NA NA Sam NA Sam

Solution

  • Using tidyverse

    library(dplyr)
    df1 %>% 
     rowwise %>% 
      mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>% 
      ungroup
    

    -output

    # A tibble: 6 × 6
      Name  Name1 Name2   Name3 Name4 unique             
      <chr> <chr> <chr>   <chr> <chr> <chr>              
    1 Tom   Tom   Thomas  Tom   Tommy Tom, Thomas, Tommy 
    2 Jim   <NA>  James   <NA>  Jimmy Jim, James, Jimmy  
    3 Dave  Dave  David   <NA>  Davey Dave, David, Davey 
    4 Tim   <NA>  Timothy Tim   Timmy Tim, Timothy, Timmy
    5 Rob   Rob   <NA>    Rob   Robby Rob, Robby         
    6 Sam   <NA>  <NA>    Sam   <NA>  Sam