Search code examples
linuxbashunique

Unique the columns and get the frequencies in linux


I have a data.txt with a matrix structure (4 X 9):

101000110
000000010
001010010
100101101

I want to count the frequencies of unique columns, the expected result is:

1001 2
0000 1
1010 1
0001 3 
0010 1
1110 1

I only find "unique lines according to specific columns" using awk on the Internet, do I need to first transpose my data to solve this problem. I wonder whether there is a more direct way to figure it out? Thank you.


Solution

  • This awk will help:

    awk '{for (i=1;i<=NF;i++){
             a[i]=a[i]""$i
           }
         }
         END{
         for (i=1;i<=9;i++) {
           res[a[i]]++
           }
         for (r in res){
             print r, res[r] 
           }
         }' FS= yourfile
    

    Result

    1110 1
    0000 1
    0010 1
    0001 3
    1010 1
    1001 2
    

    Explanation

    for (i=1;i<=NF;i++){
             a[i]=a[i]""$i
           }
         }
    

    Stores the info in a nine column array as a key, as we know that it’s a regular matrix we will append each value to its position

     for (i=1;i<=9;i++) {
       res[a[i]]++
       }
    

    Store the number into an associative array and count the occurrences

     for (r in res){
         print r, res[r] 
       }
    

    Just show the final result.