Search code examples
bashawksedgrepsh

Check if file contains same text in consecutive lines


I want to check if a log file has any instance where two or more consecutive lines contains the same text using bash. The text will be specified. The timestamp and any other text after the third field are to be ignored in the comparison.

i.e grep... "error" /tmp/file.txt

this file will match:

2020-01-01 05:05 text1
2020-01-01 05:07 error
2020-01-01 05:15 error
2020-01-01 05:25 error
2020-01-01 05:45 text2

this won't

2020-01-01 05:05 text1
2020-01-01 05:15 error
2020-01-01 05:25 text2
2020-01-01 05:45 error
2020-01-01 05:05 text3

Any ideas using grep, sed or awk? Ideally I'd like to have an exit value 0 for match and 1 for not match.


Solution

  • Looks like uniq does everything you need.

    -d, --repeated
    only print duplicate lines, one for each group

    -s, --skip-chars=N
    avoid comparing the first N characters

    So this should work for you:

    uniq --skip-chars=17 -d /tmp/file.txt
    

    Tested on my machine:

    $ cat in.txt 
    2020-01-01 05:05 text1
    2020-01-01 05:07 error
    2020-01-01 05:15 error
    2020-01-01 05:25 error
    2020-01-01 05:45 text2
    
    $ uniq --skip-chars=17 -d in.txt 
    2020-01-01 05:07 error