Search code examples
stringrperlfrequency-distribution

frequency of letters at position in string


I want to count the frequency of 4 letters at every position across strings. The letters are A, T, G, C

TGAGGTAGTAGTTTGTGCTGTTAT
TAGTAGTTTGTGCTGTTA
TGAGGTAGTAGTTTGTAC
TGAGAACTGAATTCCATAGG

desired output:

  Pos1  Pos2  Pos3  and so on. 
A 0     1
T 4     0
C 0     0
G 0     3

So far I have used an R package called Biostrings, which works, but I wonder if perl would do this?


Solution

  • For the record, for

    x = "TGAGGTAGTAGTTTGTGCTGTTAT
    TAGTAGTTTGTGCTGTTA
    TGAGGTAGTAGTTTGTAC
    TGAGAACTGAATTCCATAGG"
    

    a Biostrings solution is

    library(Biostrings)
    consensusMatrix(DNAStringSet(strsplit(x, "\n")[[1]]))
    

    which will be fast for millions of sequences.