Splitting strings into components

For example, I have a data table with several columns:

column A                      column B
key_500:station and loc       2
spectra:key_600:type          9
alpha:key_100:number          12

I want to split the rows of column A into components and create new columns, guided by the following rules:

the value between "key_" and ":" will be var1,
the next value after ":" will be var2,
the original column A should retain the part of string that is prior to ":key_". If it is empty (as in the first line), then replace "" with an "effect" word.

My expected final data table should be like this one:

column A   column B  var1  var2
effect     2         500   station and loc
spectra    9         600   type
alpha      12        100   number

Solution

Using tidyr extract you can extract specific part of the string using regex.

tidyr::extract(df, columnA, into = c('var1', 'var2'), 'key_(\\d+):(.*)', 
               convert = TRUE, remove = FALSE) %>%
  dplyr::mutate(columnA = sub(':?key_.*', '', columnA), 
         columnA = replace(columnA, columnA == '', 'effect'))

#  columnA var1            var2 columnB
#1  effect  500 station and loc       2
#2 spectra  600            type       9
#3   alpha  100          number      12

If you want to use data.table you can break this down in steps :

library(data.table)
setDT(df)
df[, c('var1', 'var2') := .(sub('.*key_(\\d+).*', '\\1',columnA), 
                            sub('.*key_\\d+:', '', columnA))]
df[, columnA := sub(':?key_.*', '', columnA)]
df[, columnA := replace(columnA, columnA == '', 'effect')]

data

df <-  structure(list(columnA = c("key_500:station and loc", 
"spectra:key_600:type", "alpha:key_100:number"), 
columnB = c(2L, 9L, 12L)), class = "data.frame", row.names = c(NA, -3L))