I'm a python beginner and I've come across this problem and I'm not sure how I'd go about tackling it.
If I have the following sequence/strings:
GATCCG
GTACGC
How to I count the frequency each letter occurs at each position. ie) G occurs at position one twice in the two sequences, A occurs at position 1 zero times etc.
Any help would be appreciated, thank you!
You can use a combination of defaultdict
and enumerate
like so:
from collections import defaultdict
sequences = ['GATCCG', 'GTACGC']
d = defaultdict(lambda: defaultdict(int)) # d[char][position] = count
for seq in sequences:
for i, char in enumerate(seq): # enum('abc'): [(0,'a'),(1,'b'),(2,'c')]
d[char][i] += 1
d['C'][3] # 2
d['C'][4] # 1
d['C'][5] # 1
This builds a nested defaultdict
that takes the character as first and the position as second key and provides the count of occurrences of said character in said position.
If you want lists of position-counts:
max_len = max(map(len, sequences))
d = defaultdict(lambda: [0]*max_len) # d[char] = [pos0, pos12, ...]
for seq in sequences:
for i, char in enumerate(seq):
d[char][i] += 1
d['G'] # [2, 0, 0, 0, 1, 1]