Search code examples
regexperlawkpcregrep

regex to search pattern and output multiple lines until another pattern


I have a log file, where every log follows a pattern:
Date [FLAG] LogRequestID : Content

The Content part of each log might span multiple lines. Given a LogRequestID, I need to search for all occurrences, and get the entire log. I need this to be done using either perl, awk, sed or pcregrep.

Example input ( Note there is no empty line between the logs):

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,828 [INFO] 567890 (Blah : Blah1) Service-name:: Content( May span multiple lines)

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,831 [INFO] 567890 (Blah : Blah2) Service-name:: Content( May span multiple lines)

Given the search key 123456 I want to extract the following:

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
ID3=123108 Status=Unknown
Code=530007 Dest=CA
]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content

Using grep gives me the single line logs, but only gives me part of the multi-line logs.

I tried checking for few lines after the search pattern, using awk, and checking if another log is reached, but it becomes to inefficient. I need some sort of regex that can be used with pcregrep or perl or even awk, to fetch this output.

Please help me out as I'm pretty bad with regular expressions.


Solution

  • How about that:

    awk '/[0-9]{2}[[:space:]][[:alnum:]_]+[[:space:]][0-9]{4}/{ n = 0 }/123456/{ n = 1 }n' file
    

    Output:

        24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content
    
        24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. Printing Object[ ID1=fac-adasd ID2=123231
        ID3=123108 Status=Unknown
        Code=530007 Dest=CA
        ]
    
        24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content
    

    The regex in the beginning is matching the Date at the start of each entry and is setting n to zero. But when there is your desired ID in the line n is set to one and everything is printed until the next date.