Search code examples
regexregex-group

Regex: Group OR Group


I have a text file coming in, bar-delimited, where a field is supposed to hold city comma state. An example:

|Boston, MA|

In my application I need to capture the city and state as two different fields. This is part of a larger Regex to process an entire line, but that part that handles this field is:

\|(.+),(.+[^|]+)\|

And this captures the text before the comma into one group and the text after into another group. Works great when the field is filled out as expected.

My problem is that sometimes, this field will come in with either:

  • NO information between the bars (||)

or - Text without a comma (|unknown|)

And I need to modify this regex so that:

  • If there is no information between the bars, I still get 2 groups with blank values

  • If there is a string of text without a comma, that string gets captured as group one, and group 2 is captured with a blank value


Solution

  • With this regex you should get what you expect:

    \|\s*([^,|]*?)\s*(?:,\s*([^|]*?)\s*)?\|
    

    It fills up group 1 and 2. If there is no comma, you get an empty group 2. If there is no value (or white space) between the bars, you get two empty groups.

    If you want the white spaces to be part of the matches the regex would look like this:

    \|([^,|]*)(?:,([^|]*))?\|