Search code examples
awkcutuniq

Counting the number of unique values based on two columns in bash


I have a tab-separated file looking like this:

A 1234
A 123245
A 4546
A 1234
B 24234
B 4545
C 1234
C 1234

Output: 
A 3
B 2
C 1

Basically I need counts of unique values that belong to the first column, all in one commando with pipelines. As you may see, there can be some duplicates like "A 1234". I had some ideas with awk or cut, but neither of the seem to work. They just print out all unique pairs, while I need count of unique values from the second column considering the value in the first one.

awk -F " "'{print $1}' file.tsv | uniq -c
cut -d' ' -f1,2 file.tsv | sort | uniq -ci

I'd really appreciate your help! Thank you in advance.


Solution

  • With complete awk solution could you please try following.

    awk 'BEGIN{FS=OFS="\t"} !found[$0]++{val[$1]++} END{for(i in val){print i,val[i]}}' Input_file
    

    Explanation: Adding detailed explanation for above.

    awk '                  ##Starting awk program from here.
    BEGIN{
      FS=OFS="\t"
    }
    !found[$0]++{       ##Checking condition if 1st and 2nd column is NOT present in found array then do following.
      val[$1]++            ##Creating val with 1st column inex and keep increasing its value here.
    }
    END{                   ##Starting END block of this progra from here.
      for(i in val){       ##Traversing through array val here.
        print i,val[i]     ##Printing i and value of val with index i here.
      }
    }
    '  Input_file          ##Mentioning Input_file name here.