Search code examples
rregexregex-lookaroundsstringrregex-greedy

Regex for matching between a colon and last newline prior to next colon


I am trying to parse a string with regex to pull out information between a colon and the last newline prior to the next colon. How can I do this?

string <- "Name: Al's\nPlace\nCountry:\nState\n/ Province: RI\n"
stringr::str_extract_all(string, "(?<=:)(.*)(?:\\n)")

but I get:

[[1]]
[1] " Al's\n" " \n"  " RI\n" 

when I want:

[[1]]
[1] " Al's\nPlace\n" " \n"  " RI\n" 

Solution

  • I'm not sure if this is what you're after as your wanted output looks a bit different.

    :((?:.*\\n?)+?)(?=.*:|$)
    
    • : match a colon
    • ((?:.*\n?)+?) match and capture lazily any lines (to optional \n)
    • (?=.*:|$) until there is a line with colon ahead

    See this demo at regex101