I am trying to get rid of duplicates from a dataset and I just found out that the rows are not exactly the same. Thus, I am trying to preserve the information and generate another column. For example, suppose I have the following data:
df <- data.frame(id = c("a", "a", "b", "c", "c", "d"),
color = c("red", "blue", "green", "blue", "green","red"))
> df
id color
1 a red
2 a blue
3 b green
4 c blue
5 c green
6 d red
Now, I would like to have no repetitive ids, thus I would like the extra information to be on another column. The result should be something like this:
> df2
id color color2
1 a red blue
2 b green
3 c blue green
4 d red
Is there a simple way to accomplish this?
Here's one way to do this with tidyverse
packages
library(dplyr)
library(tidyr)
df %>%
mutate(n = row_number(),
.by = id) %>%
pivot_wider(
names_from = n,
names_prefix = 'color_',
values_from = color
)
#> # A tibble: 4 × 3
#> id color_1 color_2
#> <chr> <chr> <chr>
#> 1 a red blue
#> 2 b green <NA>
#> 3 c blue green
#> 4 d red <NA>
Created on 2023-10-18 with reprex v2.0.2