Search code examples
sortinggrepuniqwc

Why do these two GREP commands give different results?


I have a large log file containing lines for a particular task as follows:

[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789

I want to count the number of unique "My task"s logged. Which in this case should be 3.

I have used these two commands which, in my opinion, should give the same and correct results:

grep 'My Task :' | uniq | wc -l
grep -E 'My Task :' | sort --unique | grep -cE 'My Task :'

The two commands give the same results on the small test files I create but different results on the large log file on the server. I cannot understand why. To be exact, the first command gives a count of ~33k while the second one gives ~15k. Which command of the two, if any is correct? And what should I ideally be doing?


Solution

  • It's possible it happens because uniq can only find consecutive identical lines. Say, if your file looks like this:

    [info] My task : 123
    [info] Other task : 111
    [info] My task : 456
    [info] My task : 456
    [info] My task : 789
    
    [info] My task : 123
    [info] Other task : 111
    [info] My task : 456
    [info] My task : 456
    [info] My task : 789
    
    [info] My task : 123
    [info] Other task : 111
    [info] My task : 456
    [info] My task : 456
    [info] My task : 789
    
    [info] My task : 123
    [info] Other task : 111
    [info] My task : 456
    [info] My task : 456
    [info] My task : 789
    
    [info] My task : 123
    [info] Other task : 111
    [info] My task : 456
    [info] My task : 456
    [info] My task : 789
    

    results will be different:

    $ grep 'My task :' FILE | uniq | wc -l
    15
    $ grep -E 'My task :' FILE | sort --unique  | wc -l
    3