For example, I have a data table with several columns:
column A column B
key_500:station and loc 2
spectra:key_600:type 9
alpha:key_100:number 12
I want to split the rows of column A into components and create new columns, guided by the following rules:
key_
" and ":
" will be var1
,:
" will be var2
,column A
should retain the part of string that is prior to ":key_
". If it is empty (as in the first line), then replace ""
with an "effect
" word.My expected final data table should be like this one:
column A column B var1 var2
effect 2 500 station and loc
spectra 9 600 type
alpha 12 100 number
Using tidyr
extract
you can extract specific part of the string using regex.
tidyr::extract(df, columnA, into = c('var1', 'var2'), 'key_(\\d+):(.*)',
convert = TRUE, remove = FALSE) %>%
dplyr::mutate(columnA = sub(':?key_.*', '', columnA),
columnA = replace(columnA, columnA == '', 'effect'))
# columnA var1 var2 columnB
#1 effect 500 station and loc 2
#2 spectra 600 type 9
#3 alpha 100 number 12
If you want to use data.table
you can break this down in steps :
library(data.table)
setDT(df)
df[, c('var1', 'var2') := .(sub('.*key_(\\d+).*', '\\1',columnA),
sub('.*key_\\d+:', '', columnA))]
df[, columnA := sub(':?key_.*', '', columnA)]
df[, columnA := replace(columnA, columnA == '', 'effect')]
data
df <- structure(list(columnA = c("key_500:station and loc",
"spectra:key_600:type", "alpha:key_100:number"),
columnB = c(2L, 9L, 12L)), class = "data.frame", row.names = c(NA, -3L))