Separating data by delimiter in R: How can I specify at which delimiter (for example the 4th in a series of 5) that characters are separated?

I have a column of unique photo IDs in a data frame. Each photo ID is comprised of a long string of characters and separated by a "_".

I want to split these IDs into columns using the delimiter "_".

This sounds easy enough, but I would like to split the characters into two specific columns at the last delimiter. For example if one of my photo IDs were:

"TV_Banana_122_Afternoon_Pre"

And I am trying to get the resulting split to be:

"TV_Banana_122_Afternoon" and "Pre"

Is there any way to specify that I want to separate this phrase by the 4th delimiter only?

Solution

You could use strsplit and the following regular expression to separate the string and then do.call and rbind to create a new dataframe with each part in its own column.

CODE TO MATCH SPECIFIC # OF UNDERSCORES

df <- data.frame(x = c("TV_Banana_122_Afternoon_Pre"))

df_new <- data.frame(do.call("rbind", strsplit(sub('(^[^_]+_[^_]+_[^_]+_[^_]+)_(.*)$', '\\1 \\2', df), ' ')))

df_new

OUTPUT

                       X1  X2
1 TV_Banana_122_Afternoon Pre

Per the comment by @AnilGoyal, if you needed to match an additional underscore you would just need to add an additional _[^_]+ to the first match in sub. See example below.

CODE TO MATCH A STRING WITH AN ADDITIONAL UNDERSCORE

df2 <- data.frame(x = c("TV_Banana_122_Afternoon_Test_Pre"))

df2_new <- data.frame(do.call("rbind", strsplit(sub('(^[^_]+_[^_]+_[^_]+_[^_]+_[^_]+)_(.*)$', '\\1 \\2', df2), ' ')))

df2_new

OUTPUT

                            X1  X2
1 TV_Banana_122_Afternoon_Test Pre

Also, if you have strings with varying #s of underscores, but you always want to split at the last underscore, you could just match the whole string up to the last underscore per the regex below.

CODE TO MATCH THE LAST UNDERSCORE

df_new2 <- data.frame(do.call("rbind", strsplit(sub('(.*)_(.*)$', '\\1 \\2', df), ' ')))
df_new2

OUTPUT

                   X1  X2
1 TV_Banana_122_Afternoon Pre