Search code examples
rregextidyrseparator

R - separate with specific symbol, vertical bare, |


I have a dataset with a column with the symbol '|' (come from the interaction of 2 variables in a model), and I want to split it according this character.

The function separate works well with standard character, do you how I can specific the character '|' ?

library(tidyverse)
df <- data.frame(Interaction = c('var1|var2'))

# as expected
df %>% separate(Interaction, c('var1', 'var2'), sep = '1')
#   var1  var2
# 1  var |var2

# not as expected
df %>% separate(Interaction, c('var1', 'var2'), sep = '|')
#   var1 var2
# 1         v

Solution

  • We can either escape (\\) the | as it is a metacharacter for regex specifying for OR and the sep by default is in the regex mode

    If we look at the ?separate documentation,

    separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)

    and it is described as

    sep - If character, is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

    df %>% 
      separate(Interaction, c('var1', 'var2'), sep = '\\|')
    

    or place it in square brackets

    df %>% 
       separate(Interaction, c('var1', 'var2'), sep = '[|]')