I have a data.txt
with a matrix structure (4 X 9):
101000110
000000010
001010010
100101101
I want to count the frequencies of unique columns, the expected result is:
1001 2
0000 1
1010 1
0001 3
0010 1
1110 1
I only find "unique lines according to specific columns" using awk
on the Internet, do I need to first transpose my data to solve this problem. I wonder whether there is a more direct way to figure it out? Thank you.
This awk
will help:
awk '{for (i=1;i<=NF;i++){
a[i]=a[i]""$i
}
}
END{
for (i=1;i<=9;i++) {
res[a[i]]++
}
for (r in res){
print r, res[r]
}
}' FS= yourfile
Result
1110 1
0000 1
0010 1
0001 3
1010 1
1001 2
Explanation
for (i=1;i<=NF;i++){
a[i]=a[i]""$i
}
}
Stores the info in a nine column array as a key, as we know that it’s a regular matrix we will append each value to its position
for (i=1;i<=9;i++) {
res[a[i]]++
}
Store the number into an associative array and count the occurrences
for (r in res){
print r, res[r]
}
Just show the final result.