I have a column of unique photo IDs in a data frame. Each photo ID is comprised of a long string of characters and separated by a "_".
I want to split these IDs into columns using the delimiter "_".
This sounds easy enough, but I would like to split the characters into two specific columns at the last delimiter. For example if one of my photo IDs were:
"TV_Banana_122_Afternoon_Pre"
And I am trying to get the resulting split to be:
"TV_Banana_122_Afternoon" and "Pre"
Is there any way to specify that I want to separate this phrase by the 4th delimiter only?
You could use strsplit
and the following regular expression to separate the string and then do.call
and rbind
to create a new dataframe with each part in its own column.
CODE TO MATCH SPECIFIC # OF UNDERSCORES
df <- data.frame(x = c("TV_Banana_122_Afternoon_Pre"))
df_new <- data.frame(do.call("rbind", strsplit(sub('(^[^_]+_[^_]+_[^_]+_[^_]+)_(.*)$', '\\1 \\2', df), ' ')))
df_new
OUTPUT
X1 X2
1 TV_Banana_122_Afternoon Pre
Per the comment by @AnilGoyal, if you needed to match an additional underscore you would just need to add an additional _[^_]+
to the first match in sub
. See example below.
CODE TO MATCH A STRING WITH AN ADDITIONAL UNDERSCORE
df2 <- data.frame(x = c("TV_Banana_122_Afternoon_Test_Pre"))
df2_new <- data.frame(do.call("rbind", strsplit(sub('(^[^_]+_[^_]+_[^_]+_[^_]+_[^_]+)_(.*)$', '\\1 \\2', df2), ' ')))
df2_new
OUTPUT
X1 X2
1 TV_Banana_122_Afternoon_Test Pre
Also, if you have strings with varying #s of underscores, but you always want to split at the last underscore, you could just match the whole string up to the last underscore per the regex below.
CODE TO MATCH THE LAST UNDERSCORE
df_new2 <- data.frame(do.call("rbind", strsplit(sub('(.*)_(.*)$', '\\1 \\2', df), ' ')))
df_new2
OUTPUT
X1 X2
1 TV_Banana_122_Afternoon Pre