I have this Input text file:
CD196_RS15035 normal alleles
CD196_RS15035 normal alleles
CD196_RS15035 truncation in the allele
CD196_RS15035 truncation in the allele
CD196_RS15035 no stop for allele
CD196_RS15035 no stop for allele
CD196_RS16835 normal alleles
CD196_RS16835 truncation in the allele
CD196_RS16835 no stop for allele
CD196_RS16835 no stop for allele
I want to count the number of times each string occurs in the second column which corresponds to the first column.
I want Output text file like this:
CD196_RS15035 normal alleles 2 truncation in the allele 2 no stop for allele 2
CD196_RS16835 normal alleles 1 truncation in the allele 1 no stop for allele 2
Any tip would be helpful. Thank you.
With awk
's multidimensional array:
awk -F'[ ]{2,}'
'{ a[$1][$2]+=1 }
END{
for (i in a) {
printf("%s ", i);
for (j in a[i]) printf("%s %d ", j, a[i][j]);
print "";
}
}'
test.txt
CD196_RS15035 normal alleles 2 no stop for allele 2 truncation in the allele 2
CD196_RS16835 normal alleles 1 no stop for allele 2 truncation in the allele 1