Search code examples
rdelimitertext-extraction

Find and extract text between delimiters R


I have the following data string

    Seat_WASHER<-
  structure(
    list(
      Description = c(
        "SEAT WASHER, MR2, 8\", TN 10.12, CR 150/600, 316 Stainless Steel",
        "SEAT WASHER, 1\", TN 1.42, CR 950/1200, MR1, 316 Stainless Steel",
        "SEAT WASHER, 3\", TN 1.52,  MR1, 316 Stainless Steel",
        "SEAT WASHER, MR1, 2\", TN 1.62, CR 800/1200, 316 Stainless Steel",
        "SEAT WASHER, MR1, TN 2.12, 1/2\", CR 150/600, 316 Stainless Steel",
        "SEAT WASHER, MR6, 2\", TN 6.48, CR 750/100, 316 Stainless Steel"
      )
    ),
    row.names = c(NA,-7L),
    class = c("tbl_df", "tbl", "data.frame")
  )

It's a very large data set and is not consistent in it's order or contents with strings.

How do I find key indicators (", CR, MR), and pull all data between the delimiters into a column? If it can't find the key indicator in the string it'll need to output NULL.

Finding all CR will result in a column like:

Col 1 
--------
CR 150/600
CR 950/1200
NULL
CR 800/1200
CR 150/600
CR 750/100


Solution

  • You can try

    library(stringr)
    
    Seat_WASHER$col1 <- str_extract(Seat_WASHER$Description , "CR \\d+/\\d+")
    
    • output
             col1
    1  CR 150/600
    2 CR 950/1200
    3        <NA>
    4 CR 800/1200
    5  CR 150/600
    6  CR 750/100