I want to count the frequency of 4 letters at every position across strings. The letters are A, T, G, C
TGAGGTAGTAGTTTGTGCTGTTAT
TAGTAGTTTGTGCTGTTA
TGAGGTAGTAGTTTGTAC
TGAGAACTGAATTCCATAGG
desired output:
Pos1 Pos2 Pos3 and so on.
A 0 1
T 4 0
C 0 0
G 0 3
So far I have used an R package called Biostrings, which works, but I wonder if perl would do this?
For the record, for
x = "TGAGGTAGTAGTTTGTGCTGTTAT
TAGTAGTTTGTGCTGTTA
TGAGGTAGTAGTTTGTAC
TGAGAACTGAATTCCATAGG"
a Biostrings solution is
library(Biostrings)
consensusMatrix(DNAStringSet(strsplit(x, "\n")[[1]]))
which will be fast for millions of sequences.