Search code examples
regexunixcommand-linepcrepcregrep

Matching multiline string at command line: return certain line if pattern matches, otherwise return empty string


The output of a command I have takes on the following form when it is a "success":

/ >  -------
ABC123
/ > 

It's possible for this command to emit something like this, though (a "failure"):

/ >  -------
ABC123
 -------
DEF456
 -------
Hello (world!)
 -------
(any old string, really)
/ > 

Or, this (another "failure"):

/ > / >

For the first example, I would like to emit:

ABC123

For the other two examples, I would like to emit the empty string.

I tried this, which worked great for the third example:

mycmd | pcregrep -M '(?:/\s>\s{2}-{7}\n)[^\n]*(?!\n.*\n)'

But for the first two examples it emitted:

/ >  -------
ABC123

I'm at a loss for what to do. My regex above was an attempt to match the leading / > ------- but not capture it, then match the next line only if it was not followed by another line ending with a newline. I am fine with using something other than pcregrep to solve this problem, but I am not able to express this with awk or sed. I would use Python, but it is too slow for my needs. Any help?


Solution

  • I thought the following would work, but I could not get a look-behind expression to work if it contained a newline.

    mycmd | pcregrep -M '(?<=^/ >  -{7}\n).*\n(?=/ > $)'
    

    But the following two stage solution worked for me:

    mycmd | pcregrep -M '^/ >  -{7}\n.*\n/ > $' | pcregrep -v '^/ >'
    

    Update in response to OP's answer

    I like the \K escape :-)

    I assume you do not want to match the following situation

    / > -------
    / > perhaps text here
    / > 
    

    I was able to get negative look ahead to work when it contains \n, even when it is embedded within a positive look ahead.

    Here is a simpler regex with \K that is closer to what you want. It disallows any content after the / >, but it still allows lines before the / > -------.

    mycmd | pcregrep -Mo '^/ >  -{7}\n\K(?!/ >).+(?=\n/ > $(?!\n[\s\S]))'
    

    If the captured line should be allowed to start with / >, then it is simpler:

    mycmd | pcregrep -Mo '^/ >  -{7}\n\K.+(?=\n/ > $(?!\n[\s\S]))'
    

    Final update

    Here is a sed one liner that I believe gives the exact result, disallowing any extra lines before or after. However, it does allow capturing a line that begins with / >.

    mycmd | sed -n '1{/^\/ >  -\{7\}$/{n;/./{h;n;/^\/ > $/{${x;p}}}}}'
    

    And here is another sed solution

    mycmd | sed -n '1{h;n;H;x;N;${/^\/ >  -\{7\}\n..*\n\/ > $/{x;p}}}'