I have a large log file containing lines for a particular task as follows:
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
I want to count the number of unique "My task"
s logged. Which in this case should be 3.
I have used these two commands which, in my opinion, should give the same and correct results:
grep 'My Task :' | uniq | wc -l
grep -E 'My Task :' | sort --unique | grep -cE 'My Task :'
The two commands give the same results on the small test files I create but different results on the large log file on the server. I cannot understand why. To be exact, the first command gives a count of ~33k while the second one gives ~15k. Which command of the two, if any is correct? And what should I ideally be doing?
It's possible it happens because uniq
can only find consecutive
identical lines. Say, if your file looks like this:
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
[info] My task : 123
[info] Other task : 111
[info] My task : 456
[info] My task : 456
[info] My task : 789
results will be different:
$ grep 'My task :' FILE | uniq | wc -l
15
$ grep -E 'My task :' FILE | sort --unique | wc -l
3