Search code examples
perlunixgreptail

grep 3 latest occurences and some lines around the occurence


I have a file like:

exception: anythinggggg...
exception: anythinggggg...
abchdhjsdhsd
ygsuhesnkc
exception: anythingggg...
exception: anything...
..
..

I want to grep the latest 2 occurrences of exception keyword along with 3 lines before and 3 lines after it.

I am using something like

grep -C 3 exception | tail -12

I am using tail -12 here as I want 6 lines per occurrence and latest 2 occurrences. this works fine when occurrences of exception are far off from each other but gives me useless lines if say both occurrences are consecutive.

abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
exception
exception
exception
abcd

In the above case, it gives me

abdgjsd
abdgjsd
abdgjsd
exception
exception
exception
abcd

however, what I want is

abdgjsd
exception
exception -----------------> OUTPUT FOR FIRST OCCURRENCE
exception
abcd

abdgjsd
abdgjsd
exception-----------------> OUTPUT FOR SECOND OCCURRENCE
exception
exception
abcd

Is there another way to this? Probably something in whch I can also specify the number of occurrences and not just grep lines and tail some output from it.


Solution

  • The output you get is because grep stops printing context (-C) at the next match. I don't see how to make it behave otherwise.

    The script below (written on the command-line) reads the whole file and forms an array of lines. Then it goes through it and prints surrounding two lines for each match, or up to start/end of array.

    perl -MList::Util=min,max -0777 -wnE'
        @m = split /\n/; 
        for (0..$#m) { 
            if ($m[$_] =~ /exception/) { 
                $bi = max(0,$_-2); 
                $ei = min($_+2, $#m);
                say for @m[$bi..$ei]; 
                say "---" 
             } 
         }
    ' input.txt
    

    The --- are printed for easier reviewing of output. This prints the desired output.

    The -0777 option makes it slurp the whole file into the $_ variable, which is split by newline. The iteration goes over the array index ($#m is the index of the last element of @m). The $bi and $ei are begin/end index to print, which cannot be +/- 2 near the beginning and end of the array.

    The output can be piped to tail but this can't be automated: if a match is within the last two lines there'll be (one or two) fewer lines of output so input need be known for precise cut-off. Or find indices of matches in the script, @idx = grep { $m[$_] =~ /exception/} for 0..$#m;, and use that in the condition to only print the last two.

    If you are going to use something like this I'd make it a script. Then read all lines into an array directly, provide command-line options (like -C in grep), etc.

    Maintaining line-by-line processing would make the job far more complicated. We need to keep track of a match so that we can print the following lines once we read them. But here we need multiple such records -- for the next match(es) as well, if they come within the following lines to be printed.