Search code examples
pythonstringcountpositionfrequency

Counting the number of times a letter occurs at a certain position using python


I'm a python beginner and I've come across this problem and I'm not sure how I'd go about tackling it.

If I have the following sequence/strings:

GATCCG

GTACGC

How to I count the frequency each letter occurs at each position. ie) G occurs at position one twice in the two sequences, A occurs at position 1 zero times etc.

Any help would be appreciated, thank you!


Solution

  • You can use a combination of defaultdict and enumerate like so:

    from  collections import defaultdict
    
    sequences = ['GATCCG', 'GTACGC']
    d = defaultdict(lambda: defaultdict(int))  # d[char][position] = count
    for seq in sequences:
        for i, char in enumerate(seq):  # enum('abc'): [(0,'a'),(1,'b'),(2,'c')]
            d[char][i] += 1
    
    d['C'][3]  # 2
    d['C'][4]  # 1
    d['C'][5]  # 1
    

    This builds a nested defaultdict that takes the character as first and the position as second key and provides the count of occurrences of said character in said position.

    If you want lists of position-counts:

    max_len = max(map(len, sequences))
    d = defaultdict(lambda: [0]*max_len)  # d[char] = [pos0, pos12, ...]
    for seq in sequences:
        for i, char in enumerate(seq): 
            d[char][i] += 1
    
    d['G']  # [2, 0, 0, 0, 1, 1]