Search code examples
regexperl

Perl optional capture groups not working?


I have the following sample.txt file:

2021-10-07 10:32:05,767 ERROR [LAWT2] blah.blah.blah - Message processing FAILED: <ExecutionReport blah="xxx" foo="yyy" SessionID="kkk" MoreStuff="zz"> Total time for which application threads were stopped: 0.0003858 seconds, Stopping threads took: 0.0000653 seconds
2021-10-07 10:31:32,902 ERROR [LAWT6] blah.blah.blah - Message processing FAILED: <NewOrderSingle SessionID="zkx" TargetSubID="ttt" Account="blah" MsgType="D" BookingTypeOverride="0" Symbol="6316" OtherField1="othervalue1" Otherfield2="othervalue2"/></D></NewOrderSingle>

I want to grab just two key fields: "SessionID" and "MsgType" and print like this:

SessionID="kkk"|
SessionID="zkx"|MsgType="D"

In other words: if the group match is not there, I want just to print blank.

I've tried the following approach but no luck:

$$ perl -ne '/ (SessionID=".*?")? .*(MsgType=".*?")? / and print "$1|$2\n"' sample.txt
SessionID="kkk"|
SessionID="zkx"|

Can somebody enlighten me here? Thank you a lot.


Solution

  • You can use

    perl -ne '/\h(SessionID="[^"]*")?(?:\h++.*(MsgType="[^"]*"))?\h/ and print "$1|$2\n"' 
    

    See the regex demo. Details:

    • \h - a horizontal whitespace
    • (SessionID="[^"]*")? - Group 1: an optional SessionID=", any zero or more chars other than ", and then a "
    • (?:\h++.*(MsgType=".*?"))? - an optional (but greedy) sequence of
      • \h++ - one or more horizontal whitespaces
      • .* - any zero or more chars other than line break chars as many as possible
      • (MsgType="[^"]*") - Group 2: SessionID=", any zero or more chars other than ", and then a "
    • \h - a horizontal whitespace.

    See the online demo:

    s='2021-10-07 10:32:05,767 ERROR [LAWT2] blah.blah.blah - Message processing FAILED: <ExecutionReport blah="xxx" foo="yyy" SessionID="kkk" MoreStuff="zz"> Total time for which application threads were stopped: 0.0003858 seconds, Stopping threads took: 0.0000653 seconds
    2021-10-07 10:31:32,902 ERROR [LAWT6] blah.blah.blah - Message processing FAILED: <NewOrderSingle SessionID="zkx" TargetSubID="ttt" Account="blah" MsgType="D" BookingTypeOverride="0" Symbol="6316" OtherField1="othervalue1" Otherfield2="othervalue2"/></D></NewOrderSingle>'
    perl -ne '/\h(SessionID=".*?")?(?:\h++.*(MsgType=".*?"))?\h/ and print "$1|$2\n"' <<< "$s"
    

    This prints:

    SessionID="kkk"|
    SessionID="zkx"|MsgType="D"