I am using the following reg-ex to match StatsD data format -
^[\w.]+:.+\|.\|#(?:[\w.]+:[^,\n]+(?:,|$))*$
This satisfies any of the following formats -
performance.os.disk:1099511627776|g|#region:us-west-1,datacenter:us-west-1a
or
performance.os.disk:1099511627776|g|#
or
performance.os.disk:1099511627776|g|#region:us-west-1
But I am unable to match it against -
datastore.reads:9876|ms
Any help?
RegEx 101 to try - https://regex101.com/r/H8vQTa/1/
You may use
^[\w.]+:[^|]+\|[^|]+(?:\|#(?:[\w.]+:[^,\n]+(?:,|$))*)?$
^^^^^^^^ ^^
See the regex demo
The point is that you only match any char with .
between two |
s, I suggest matching 1 or more chars other than |
there, and make the rest optional by wrapping \|#(?:[\w.]+:[^,\n]+(?:,|$))*
within an optional non-capturing group, (?:...)?
.
Details
^
- start of string[\w.]+
- 1+ word or .
chars:
- a colon [^|]+
- a negated character class matching 1+ non-|
chars\|
- a |
char[^|]+
- 1+ chars other than |
(?:\|#(?:[\w.]+:[^,\n]+(?:,|$))*)?
- an optional non-capturing group matching 1 or 0 occurrences of
\|#
- |#
substring(?:[\w.]+:[^,\n]+(?:,|$))*
- 0 or more consecutive repetitions of
[\w.]+:
- 1+ word or .
chars and then :
[^,\n]+
- 1+ chars other than LF (I guess it is used for debug purposes here) and ,
(?:,|$)
- ,
or end of string$
- end of string.