Search code examples
rstringr

How to extract multiple numbers between a repeating pattern using stringr?


I have a column of strings that look like the string below, where the numbers following the double colons "::" are ages. In this example, 51, 40, 9, 5, 2, and 15 are the ages. The numbers following the "||" are just saying this is the first person, second person, etc. I'd like to extract just the ages.

library(tidyverse)

ex_str = "0::51||1::40||2::9||3::5||4::2||5::15"

I've tried things like,

test_string |>
  str_extract_all("::[0-9]+")

only to get the output below.

[[1]]
[1] "::51" "::40" "::9"  "::5"  "::2"  "::15"

I apologize for the simple question. I've watched a few videos and read some guides online, but I just can't figure it out.


Solution

  • You can use str_extract_all with a regex that includes a positive look-behind for '::':

    library(tidyverse)
    
    ex_str <- "0::51||1::40||2::9||3::5||4::2||5::15"
    ages <- str_extract_all(ex_str, "(?<=::)\\d+") %>% unlist()
    ages_numeric <- as.numeric(ages)
    print(ages_numeric)
    

    Output:

    [1] 51 40  9  5  2 15