Search code examples
regexawktext-processing

Join line to previous line if it doesn't start with a timestamp in UNIX shell


I have a tool which outputs logs with a timestamp prefix, however log entries may contain newlines. I would like to merge any lines without a timestamp with the prior line.

Example:

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

Using awk I could do something like this to merge lines that don't begin with a [ brace:

awk -v RS="[" 'NR>1{$1=$1; print RS, $0}'

However you can see where this fails on the "twist" line above. The "twist" line starts with a [ which isn't part of a timestamp.

Is there a way to use a regular expression for that timestamp prefix instead? Or is there a better command line tool for accomplishing this?


Solution

  • Could you please try following written and tested with shown samples on site https://ideone.com/PXVCh2

    awk '
    {
      printf("%s%s",$0~/^\[ [0-9]{4}\/[0-9]{2}\/[0-9]{2}/\
              ?(FNR!=1?ORS:""):OFS,$0)
    }
    END{ print "" }
    ' Input_file
    

    As per Ed sir's comment added a print new line statement to add a new line at last of Input_file in case it's already doing it one could ommit that part then.

    Note: I have written this on mobile; sorry I can't judge how it's looking wise on big screen so I have divided a single of printed line into 2 lines here