I have a data.frame
that is not unique for a subset of its columns and what I want to do is to obtain the unique part of the non-unique columns and corresponding aggregate (via collapse
) the unique columns.
Here's an example:
library(dplyr)
set.seed(1)
df <- data.frame(v1=rnorm(5), v2=rnorm(5), v3=rnorm(5), id=paste0("id",1:5))
#Replicate a couple of rows adding different ids:
df <- df %>% rbind(df[sample(5,2,replace = F),] %>% dplyr::mutate(id=paste0("id",6:7)))
> df
v1 v2 v3 id
1 -0.6264538 -0.8204684 1.5117812 id1
2 0.1836433 0.4874291 0.3898432 id2
3 -0.8356286 0.7383247 -0.6212406 id3
4 1.5952808 0.5757814 -2.2146999 id4
5 0.3295078 -0.3053884 1.1249309 id5
6 -0.8356286 0.7383247 -0.6212406 id6
7 0.3295078 -0.3053884 1.1249309 id7
Since rows 3 and 5 are replicated the resulting data.frame
would be:
> data.frame(df[1:5,] %>% dplyr::select(-id),id=c("id1","id2","id3,id6","id4","id5,id7"))
v1 v2 v3 id
1 -0.6264538 -0.8204684 1.5117812 id1
2 0.1836433 0.4874291 0.3898432 id2
3 -0.8356286 0.7383247 -0.6212406 id3,id6
4 1.5952808 0.5757814 -2.2146999 id4
5 0.3295078 -0.3053884 1.1249309 id5,id7
Is there a dplyr
or tidyr
way for doing this?
df%>%
group_by(v1,v2,v3)%>%
summarise(id=toString(id))
# A tibble: 5 x 4
# Groups: v1, v2 [?]
v1 v2 v3 id
<dbl> <dbl> <dbl> <chr>
1 -0.836 0.738 -0.621 id3, id6
2 -0.626 -0.820 1.51 id1
3 0.184 0.487 0.390 id2
4 0.330 -0.305 1.12 id5, id7
5 1.60 0.576 -2.21 id4
df%>%
group_by(v1,v2,v3)%>%
summarise(id=toString(id))%>%data.frame()
v1 v2 v3 id
1 -0.8356286 0.7383247 -0.6212406 id3, id6
2 -0.6264538 -0.8204684 1.5117812 id1
3 0.1836433 0.4874291 0.3898432 id2
4 0.3295078 -0.3053884 1.1249309 id5, id7
5 1.5952808 0.5757814 -2.2146999 id4