sample = [['CGG','ATT'],['GCGC','TAAA']]
base_counts = [[Counter(base) for base in sub] for sub in sample]
#Output : [[Counter({'G': 2, 'C': 1}), Counter({'T': 2, 'A': 1})], [Counter({'C': 2, 'G': 2}), Counter({'A': 3, 'T': 1})]]
base_freqs = [[{k_v[0]:k_v[1]/len(bases[i]) for i,k_v in enumerate(count.items())} for count in counts] for counts, bases in zip(base_counts, sample)]
#Output 2 [[{'C': 0.3333333333333333, 'G': 0.6666666666666666}, {'A': 0.3333333333333333, 'T': 0.6666666666666666}], [{'C': 0.5, 'G': 0.5}, {'A': 0.75, 'T': 0.25}]]
The sample is the input and the Output2 is the final output of the program. The program with base_freqs, computes the frequency of "bases" (bases = ATGC) in each pair of the sample. The output provides the correct answer. However, I would like to see the code in a for-loop format than comprehensions.
This code has been originally taken from the answer posted here
Yes. The way to read comprehensions in outside in and left to right. Let's format it a little for readability:
base =[
[
{ k_v[0] : k_v[1]/len(bases[i])
for i,k_v in enumerate(count.items())
}
for count in counts
]
for counts, bases in zip(base_counts, sample)
]
Is the same as:
bases = []
for counts, bases in zip(base_counts, sample):
temp_list = []
for count in counts:
temp_dict = {}
for i, k_v in enumerate(count.items()):
temp_dict[k_v[0]] = k_v[1] / len(bases[i])
temp_list.append(temp_dict)
bases.append(temp_list)
The list comprehension is better from a performance standpoint because you aren't constantly creating new lists and dicts, or calling methods like append, which have some overhead.