Search code examples
rstrsplit

Parse By nth delimitor in R


I have a dataframe like below:

Col1    Col2
   A    5!5!!6!!3!!m
   B    7_8!!6!!7!!t

structure(list(Col1 = c("A", "B"), Col2 = c("5!5!!6!!3!!m", "7_8!!6!!7!!t" )), class = "data.frame", row.names = c(NA, -2L))

How do I create a new column that extracts the 3rd parse of the strings found in Col2?

In SQL I am using SPLIT_PART function:

SPLIT_PART(Col2, '!!', 3)

I am looking to find an equivalent function in R.

Expected output:

Col1            Col2    Col3
   A    5!5!!6!!3!!m       3
   B    7_8!!6!!7!!t       7

Solution

  • Here's a tidyverse option, though the core is funcitonally identical to Rushabh's data.table based answer.

    When given the simplify=T argument, stringr::str_split will output a matrix, with each match in a column. You can subset the desired column from it to extract the desired position:

    library(tidyverse)
    
    df1 %>%
        mutate(Col3 = str_split(Col2, pattern = '!!', simplify=T)[,3])
    
      Col1         Col2 Col3
    1    A 5!5!!6!!3!!m  5!5
    2    B 7_8!!6!!7!!t  7_8
    
    df1 %>%
        mutate(Col3 = str_split(Col2, pattern = '!!', simplify=T)[,2])
    
      Col1         Col2 Col3
    1    A 5!5!!6!!3!!m    6
    2    B 7_8!!6!!7!!t    6
    
    df1 %>%
      mutate(Col3 = str_split(Col2, pattern = '!!', simplify=T)[,1])
    
      Col1         Col2 Col3
    1    A 5!5!!6!!3!!m  5!5
    2    B 7_8!!6!!7!!t  7_8