Search code examples
rdataframedplyrdata.tabletidyverse

How to get rid of duplicate rows preserving information into another column in R?


I am trying to get rid of duplicates from a dataset and I just found out that the rows are not exactly the same. Thus, I am trying to preserve the information and generate another column. For example, suppose I have the following data:

df <- data.frame(id = c("a", "a", "b", "c", "c", "d"),
                color = c("red", "blue", "green", "blue", "green","red"))

> df
  id color
1  a   red
2  a  blue
3  b green
4  c  blue
5  c green
6  d   red

Now, I would like to have no repetitive ids, thus I would like the extra information to be on another column. The result should be something like this:

> df2
  id color color2
1  a   red   blue
2  b green       
3  c  blue  green
4  d   red   

Is there a simple way to accomplish this?


Solution

  • Here's one way to do this with tidyverse packages

    library(dplyr)
    library(tidyr)
    
    df %>%
      mutate(n = row_number(),
             .by = id) %>%
      pivot_wider(
        names_from = n,
        names_prefix = 'color_',
        values_from = color
      )
    #> # A tibble: 4 × 3
    #>   id    color_1 color_2
    #>   <chr> <chr>   <chr>  
    #> 1 a     red     blue   
    #> 2 b     green   <NA>   
    #> 3 c     blue    green  
    #> 4 d     red     <NA>
    

    Created on 2023-10-18 with reprex v2.0.2