Given the following example data ...
id Proteins
522 Q9UHC7-4;Q9UHC7-3;Q9UHC7-2;Q9UHC7
523 Q9UHV7
524 Q9Y6T7-2;Q9Y6T7
525 Q9Y6T7-2;Q9Y6T7
... I would like to create a third column with each id
times the number of semicolon delimited values of each row. More specifically something like that:
id Proteins newCol
522 Q9UHC7-4;Q9UHC7-3;Q9UHC7-2;Q9UHC7 522;522;522;522
523 Q9UHV7 523
524 Q9Y6T7-2;Q9Y6T7 524;524
525 Q9Y6T7-2;Q9Y6T7 525;525
I have tried this dt$newCol <- rep(dt$id, lengths(str_split(dt$Proteins, ";")))
but doesn't work since it creates a longer list.
Something like this?
library(stringr)
df$newCol <- str_replace_all(df$Proteins, "[^;]+", as.character(df$id))
Output
> df
id Proteins newCol
1 522 Q9UHC7-4;Q9UHC7-3;Q9UHC7-2;Q9UHC7 522;522;522;522
2 523 Q9UHV7 523
3 524 Q9Y6T7-2;Q9Y6T7 524;524
4 525 Q9Y6T7-2;Q9Y6T7 525;525
Another Base R solution suggested by @markus
df1$new <- Map(gsub, pattern = "[^;]+", replacement = df1$id, x = df1$Proteins)