I have a column to split made like A, with a different number of elements(genres) for each observations(song). I can I split the column without specifing the destination columns in R?
Column A |
---|
"['hip hop', 'pop', 'pop rap', 'r&b', 'southern hip hop', 'trap', 'trap soul']" |
"['dance pop', 'girl group', 'pop', 'post-teen pop', 'talent show', 'uk pop']" |
I'm trying to have a result like that
genre 1 | genre 2 | genre ... | genre 6 | genre 7 |
---|---|---|---|---|
Hip Hop | pop | .... | trap | Neo soul |
dance pop | girl group | .... | Uk pop | N/A |
with the number of new columns equal to the maximum number of genre a song can have( for example if the song with more genres has ten genres, I should have have ten columns) .
Another option is to create a dummy column for every genre found in the column
Hip Hop | Pop | Pop Rap | r&b | .... |
---|---|---|---|---|
1 | 1 | 1 | 1 | .... |
0 | 1 | 0 | 0 | .... |
I tried with separate in R but it gave me error
In base R
, we could use read.csv
after removing the [
, ]
, and quotes ('
, "
)
df2 <- read.csv(text = gsub('\\[|\\]|\'|"', "", df1$ColumnA),
header = FALSE, na.strings = "", col.names = paste0("genre", 1:7))
-output
df2
genre1 genre2 genre3 genre4 genre5 genre6 genre7
1 hip hop pop pop rap r&b southern hip hop trap trap soul
2 dance pop girl group pop post-teen pop talent show uk pop <NA>
The second dataset can be created with mtabulate
on the output above
library(qdapTools)
mtabulate(as.data.frame(t(df2)))
-output
girl group pop pop rap post-teen pop r&b southern hip hop talent show trap trap soul uk pop dance pop hip hop
V1 0 1 1 0 1 1 0 1 1 0 0 1
V2 1 1 0 1 0 0 1 0 0 1 1 0
df1 <- structure(list(ColumnA = c("['hip hop', 'pop', 'pop rap', 'r&b',
'southern hip hop', 'trap', 'trap soul']",
"['dance pop', 'girl group', 'pop', 'post-teen pop', 'talent show', 'uk pop']"
)), class = "data.frame", row.names = c(NA, -2L))