Search code examples
rtidyverse

split a column in to multiple columns based on patterns and delimiter in R


I have a data frame df, in which one of the column is as mentioned below. I want to split the V1 column to multiple column as mentioned in the desired output.

  df = read.table(text = " parameter_1_day_10
                    parameter_1_day_3 
                    parameter_2_last_day")

desired output

df2 = read.table(text = " 1 day_10
                 1 day_3 
                2 last_day")

Solution

  • You can do this with separate_wider_regex() from tidyr. The syntax of the regex is a bit strange. Essentially, you identify the parts of the string that will get turned into variables by naming those parts in the regex in a vector. So, the regex that gets used is the one below:

    c("parameter_", c1="[^_]*", "_", c2=".*")
    

    because "parameter_" isn't named, it will to show up as one of the columns. anything that is not an underscore ([^_]*) directly after "parameter_" until the next underscore will go into the variable named c1. The following underscore will not be saved and then everything after that underscore is saved as c2.

    library(dplyr)
    library(tidyr)
    df = read.table(text = " parameter_1_day_10
                      parameter_1_day_3 
                      parameter_2_last_day")
    
    df %>% 
      separate_wider_regex(V1, c("parameter_", c1="[^_]*", "_", c2=".*"))
    #> # A tibble: 3 × 2
    #>   c1    c2      
    #>   <chr> <chr>   
    #> 1 1     day_10  
    #> 2 1     day_3   
    #> 3 2     last_day
    

    Created on 2024-04-03 with reprex v2.0.2