Search code examples
rdplyrtidyversetidyr

Separate over several delimiting factors in R tidyR


I want to separate the complex names in my df1 after the second "_". How can I incorporate it into tidyR?

library(tidyverse)

df1 <- tibble(complex_names=c("King_Arthur_II", "Queen_Elizabeth_I", "King_Charles_III"), 
       year=c(970,1920,2022)
)
df1 
#> # A tibble: 3 × 2
#>   complex_names      year
#>   <chr>             <dbl>
#> 1 King_Arthur_II      970
#> 2 Queen_Elizabeth_I  1920
#> 3 King_Charles_III   2022

df1 |> 
separate(complex_names,into = c("name", "number"), sep="the second comma")
#> Error in into("name", "number"): could not find function "into"

Created on 2022-09-27 with reprex v2.0.2

I want my data to look like this:

name           number  year 
King_Arthur      II     970
...

Solution

  • I'm no expert in regex but this answer shows the regular expression to find the second underscore. You can then use this regular expression in separate():

    library(tidyverse)
    
    df1 <- tibble(complex_names=c("King_Arthur_II", "Queen_Elizabeth_I", "King_Charles_III"), 
                  year=c(970,1920,2022)
    )
    df1 
    #> # A tibble: 3 × 2
    #>   complex_names      year
    #>   <chr>             <dbl>
    #> 1 King_Arthur_II      970
    #> 2 Queen_Elizabeth_I  1920
    #> 3 King_Charles_III   2022
    
    
    df1 |> 
      separate(complex_names, into = c("name", "number"), sep = "(_)(?=[^_]+$)")
    #> # A tibble: 3 × 3
    #>   name            number  year
    #>   <chr>           <chr>  <dbl>
    #> 1 King_Arthur     II       970
    #> 2 Queen_Elizabeth I       1920
    #> 3 King_Charles    III     2022
    

    Created on 2022-09-27 with reprex v2.0.2