I got the following query string that contains a couple tagged values (key: value
pairs) always at the end of a string:
Lorem ipsum age:85 date:15.05.2015 sender: user: John Doe
The "Lorem ipsum" is a string that should be ignored as it's not a pair. The following pairs are valid:
age
with 85
date
with 15.05.2015
user
with John Doe
A tag should be ignored if no contents can be found after the colon. Their content can also include spaces up to the next tag's key.
Here's what I got so far:
/([\w-]+):\s*(.+?)(?!\s+[\w-]+:)?/g
but for some reason it only seems to match the first character of the value and also cut into the "user" tag (regexr playground):
age:8
date:1
sender: u
ser:J
Any help would be much appreciated!
You may use
(\w[\w-]*):(?!\s+\w[\w-]*:|\s*$)\s*(.*?)(?=\s+\w[\w-]*:|$)
See the regex demo
Details
(\w[\w-]*)
- Capturing group 1: a word char followed with 0+ word or hyphen chars:
- a colon(?!\s+\w[\w-]*:|\s*$)
- the negative lookahead fails the match if, immediately to the right of the current location, there is 1+ whitespaces, a word char followed with 0+ word or hyphen chars and then :
or 0+ whitespaces at the end of the string\s*
- 0+ whitespaces(.*?)
- Group 2: any zero or more chars other than line break chars, as few as possible, up to the closest...(?=\s+\w[\w-]*:|$)
- 1+ whitespaces, a word char followed with 0+ word or hyphen chars and then :
or just the end of the string.