Extracting information between special characters in a column in R

I'm sorry because I feel like versions of this question have been asked many times, but I simply cannot find code from other examples that works in this case. I have a column where all the information I want is stored in between two sets of "%%", and I want to extract this information between the two sets of parentheses and put it into a new column, in this case called df$empty.

This is a long column, but in all cases I just want the information between the sets of parentheses. Is there a way to code this out across the whole column?

To be specific, I want in this example a new column that will look like "information", "wanted".


empty <- c('NA', 'NA')
information <- c('notimportant%%information%%morenotimportant', 'ignorethis%%wanted%%notthiseither')

df <- data.frame(information, empty)

Solution

In this case you can do:

df$empty <- sapply(strsplit(df$information, '%%'), '[', 2)

#                                   information       empty
# 1 notimportant%%information%%morenotimportant information
# 2           ignorethis%%wanted%%notthiseither      wanted

That is, split the text by '%%' and take second elements of the resulting vectors.

Or you can get the same result using sub():

df$empty <- sub('.*%%(.+)%%.*', '\\1', df$information)