I have been trying to find the amount of 1s per each species in a fasta file that looks like this:
>111
1100101010
>102
1110000001
The desired output would be:
>111
5
>102
4
I know how to get the numbers of 1s in a file with:
grep -c 1 file
My problem is that I cannot find the way to keep track of the number of 1s per each species (instead of the total in the file).
>111
11001010101110000001
can also be written as
>111
1100101010
1110000001
but none of the existing solutions work for the latter. This addresses that oversight:
perl -Mv5.10 -ne'
if ( /^>/ ) {
say $c if defined $c;
$c = 0;
print;
} else {
$c += tr/1//;
}
END {
say $c if defined $c;
}
' file.fasta
For both files show above, the program outputs
>111
9