Search code examples
javaregexstatsd

Reg-ex to match statsD Format


I am using the following reg-ex to match StatsD data format -

^[\w.]+:.+\|.\|#(?:[\w.]+:[^,\n]+(?:,|$))*$

This satisfies any of the following formats -

performance.os.disk:1099511627776|g|#region:us-west-1,datacenter:us-west-1a

or

performance.os.disk:1099511627776|g|#

or

performance.os.disk:1099511627776|g|#region:us-west-1

But I am unable to match it against -

datastore.reads:9876|ms

Any help?

RegEx 101 to try - https://regex101.com/r/H8vQTa/1/


Solution

  • You may use

    ^[\w.]+:[^|]+\|[^|]+(?:\|#(?:[\w.]+:[^,\n]+(?:,|$))*)?$
                   ^^^^^^^^                             ^^
    

    See the regex demo

    The point is that you only match any char with . between two |s, I suggest matching 1 or more chars other than | there, and make the rest optional by wrapping \|#(?:[\w.]+:[^,\n]+(?:,|$))* within an optional non-capturing group, (?:...)?.

    Details

    • ^ - start of string
    • [\w.]+ - 1+ word or . chars
    • : - a colon
    • [^|]+ - a negated character class matching 1+ non-| chars
    • \| - a | char
    • [^|]+ - 1+ chars other than |
    • (?:\|#(?:[\w.]+:[^,\n]+(?:,|$))*)? - an optional non-capturing group matching 1 or 0 occurrences of
      • \|# - |# substring
      • (?:[\w.]+:[^,\n]+(?:,|$))* - 0 or more consecutive repetitions of
        • [\w.]+: - 1+ word or . chars and then :
        • [^,\n]+ - 1+ chars other than LF (I guess it is used for debug purposes here) and ,
        • (?:,|$) - , or end of string
    • $ - end of string.