Search code examples
rregexdataframedplyrseparator

regex to find every two commas, to separate rows from a column using dplyr


I have strings like this one:

71,72,80,81,102,100

Which I want to separate every 2 "numbers", so:

71,72
80,81
102,100

I wrote this regex:

(([0-9]{1,4}),([0-9]{1,4}))

Which higlights the groups I need, except the comma in between ","

In my code I am using dplyr

Example:

df_example <- tibble(Lotes= "LOT1,LOT2,LOT3",NoModuloPlastico = "71,72,80,81,102,100")

df_result_example <- df_example %>%
separate_rows(c(Lotes),sep=",") %>%
separate_rows(c(NoModuloPlastico),sep="(([0-9]{1,3}),([0-9]{1,3}))")

Which means what I really need is to highlight every 2 commas with regex, but I can't find how.

I couldn't adapt these links to my needs:

https://bedigit.com/blog/regex-how-to-match-everything-except-a-particular-pattern/

https://blog.codinghorror.com/excluding-matches-with-regular-expressions/

What I get:

Lotes NoModuloPlastico
LOT1 ""
LOT1 ","
LOT1 ","
LOT1 ""
LOT2 ""
LOT2 ","
LOT2 ","
LOT2 ""
LOT3 ""
LOT3 ","
LOT3 ","
LOT3 ""

What I want:

Lotes NoModuloPlastico
LOT1 71,72
LOT2 80,81
LOT3 102,100

Solution

  • You can use a bit shortened Onyambu's solution:

    df_example %>% 
      mutate(Lotes = strsplit(Lotes, ','),
        NoModuloPlastico = NoModuloPlastico %>% 
          strsplit('[^,]*,[^,]*\\K,', perl=TRUE)) %>% 
      unnest(everything())
    

    Output:

    # A tibble: 3 x 2
      Lotes NoModuloPlastico
      <chr> <chr>           
    1 LOT1  71,72           
    2 LOT2  80,81           
    3 LOT3  102,100 
    

    NOTES:

    • strsplit(Lotes, ',') splits Lotes column with a comma
    • strsplit('[^,]*,[^,]*\\K,', perl=TRUE) splits the NoModuloPlastico column with every other comma. [^,]*,[^,]* matches zero or more non-comma chars, a comma and zero or more non-comma chars, \K omits these chars matched, and then , matches a comma that is used to split the string with.