Search code examples
rdplyrtidyversetidyr

Separate columns that contain an unequal mixture of numbers and text in R


I have a very odd data frame that looks like this

library(tidyverse)

df <- tibble(position=c(100,200,300),
             correction=c("62M89S", 
                     "8M1D55M88S",
                     "1S25M1P36M89S"))

df
#> # A tibble: 3 × 2
#>   position correction   
#>      <dbl> <chr>        
#> 1      100 62M89S       
#> 2      200 8M1D55M88S   
#> 3      300 1S25M1P36M89S

Created on 2022-03-02 by the reprex package (v2.0.1) the correction column will be used to correct the positions. I want to separate the correction column in a way that my data frame looks like this

df    col1  col2 col3 col4 col5
100                    62M  89S
200          8M  1D    55M  88S
300     1S  25M  1P    36M  89S


Solution

  • Use tidyr::separate with fill = "left":

    library(tidyr)
    library(dplyr)
    df %>% 
      separate(correction, into = str_c("col", 1:5), 
               sep = "(?<=\\D)(?=\\d)", fill = "left")
    
    # A tibble: 3 x 6
      position col1  col2  col3  col4  col5 
         <dbl> <chr> <chr> <chr> <chr> <chr>
    1      100 NA    NA    NA    62M   89S  
    2      200 NA    8M    1D    55M   88S  
    3      300 1S    25M   1P    36M   89S