I have some data that looks like this (code for input at the end):
#> artist album year source id
#> 1 Beatles Sgt. Pepper's 1967 amazon B0025KVLTM
#> 2 Beatles Sgt. Pepper's 1967 spotify 6QaVfG1pHYl1z15ZxkvVDW
#> 3 Beatles Sgt. Pepper's 1967 amazon B06WGVMLJY
#> 4 Rolling Stones Sticky Fingers 1971 spotify 29m6DinzdaD0OPqWKGyMdz
I would like to fix the 'id' column (which includes ids from multiple sources, as shown in the 'source' column.
This should be a straightfoward spread()
, but the complication is that sometimes we have a duplicate id from the exact same source: see row 1 and row 3 above.
Is there an easy way to do the spread()
and put the duplicate id in a new column?
My desired outcome would be:
#> artist album year source amazon_id amazon_id_2
#> 1 Beatles Sgt. Pepper's 1967 amazon B0025KVLTM B06WGVMLJY
#> 2 Rolling Stones Sticky Fingers 1971 spotify <NA> <NA>
#> spotify
#> 1 6QaVfG1pHYl1z15ZxkvVDW
#> 2 29m6DinzdaD0OPqWKGyMdz
The code below is to input the sample data:
df <- data.frame(stringsAsFactors=FALSE,
artist = c("Beatles", "Beatles", "Beatles", "Rolling Stones"),
album = c("Sgt. Pepper's", "Sgt. Pepper's", "Sgt. Pepper's",
"Sticky Fingers"),
year = c(1967, 1967, 1967, 1971),
source = c("amazon", "spotify", "amazon", "spotify"),
id = c("B0025KVLTM", "6QaVfG1pHYl1z15ZxkvVDW", "B06WGVMLJY",
"29m6DinzdaD0OPqWKGyMdz")
)
df
Here is one approach.
df %>%
group_by(artist,source) %>%
mutate(rownum = row_number()) %>%
unite(source, source, rownum, sep="_") %>%
spread(source,id)
# A tibble: 2 x 6
# Groups: artist [2]
artist album year amazon_1 amazon_2 spotify_1
<chr> <chr> <dbl> <chr> <chr> <chr>
1 Beatles Sgt. Pepper's 1967 B0025KVLTM B06WGVMLJY 6QaVfG1pHYl1z15ZxkvVDW
2 Rolling Stones Sticky Fingers 1971 NA NA 29m6DinzdaD0OPqWKGyMdz