Search code examples
rtidyverse

R Extract specific text from column into multiple columns


I have a dataframe exported from the web with this format

id vals
1  {7,12,58,1}
2  {1,2,5,7}  
3  {15,12} 

I would like to extract ONLY the numbers (ignore curlys and commas) into multiple columns like this

id val_1 val_2 val_3 val_4 val_5
1  7     12    58    1
2  1     2     5     7  
3  15    12  

Even though the Max of values we got was 4 I want to always go up to value val_5.

Thanks!


Solution

  • Another possible tidyverse option, where we remove the curly brackets, then separate the rows on the ,, then pivot to wide form. Then, we can create the additional column (using add_column from tibble) based on the max value in the column names (which is 4 in this case), and then can create val_5.

    library(tidyverse)
    
    df %>%
      mutate(vals = str_replace_all(vals, "\\{|\\}", "")) %>%
      separate_rows(vals, sep=",") %>%
      group_by(id) %>%
      mutate(ind = row_number()) %>%
      pivot_wider(names_from = ind, values_from = vals, names_prefix = "val_") %>%
      add_column(!!(paste0("val_", parse_number(names(.)[ncol(.)])+1)) := NA)
    

    Output

      id val_1 val_2 val_3 val_4 val_5
    1  1     7    12    58     1    NA
    2  2     1     2     5     7    NA
    3  3    15    12  <NA>  <NA>    NA
    

    Data

    df <- read.table(text = "id vals
    1  {7,12,58,1}
    2  {1,2,5,7}
    3  {15,12} ", header = T)