Search code examples
regexregex-lookarounds

Regex: Find tagged strings in text


I got the following query string that contains a couple tagged values (key: value pairs) always at the end of a string:

Lorem ipsum age:85 date:15.05.2015 sender: user: John Doe

The "Lorem ipsum" is a string that should be ignored as it's not a pair. The following pairs are valid:

  • age with 85
  • date with 15.05.2015
  • user with John Doe

A tag should be ignored if no contents can be found after the colon. Their content can also include spaces up to the next tag's key.

Here's what I got so far:

/([\w-]+):\s*(.+?)(?!\s+[\w-]+:)?/g

but for some reason it only seems to match the first character of the value and also cut into the "user" tag (regexr playground):

age:8
date:1
sender: u
ser:J

Any help would be much appreciated!


Solution

  • You may use

    (\w[\w-]*):(?!\s+\w[\w-]*:|\s*$)\s*(.*?)(?=\s+\w[\w-]*:|$)
    

    See the regex demo

    Details

    • (\w[\w-]*) - Capturing group 1: a word char followed with 0+ word or hyphen chars
    • : - a colon
    • (?!\s+\w[\w-]*:|\s*$) - the negative lookahead fails the match if, immediately to the right of the current location, there is 1+ whitespaces, a word char followed with 0+ word or hyphen chars and then : or 0+ whitespaces at the end of the string
    • \s* - 0+ whitespaces
    • (.*?) - Group 2: any zero or more chars other than line break chars, as few as possible, up to the closest...
    • (?=\s+\w[\w-]*:|$) - 1+ whitespaces, a word char followed with 0+ word or hyphen chars and then : or just the end of the string.