I have a list in a for
loop and it uses itertools.product()
to find different combinations of letters. I want to use collections.Counter()
to count the number of occurrences of an item, however, right now it prints all the different combinations of "A"'s and "G"'s:
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'G']
['a', 'A', 'G', 'g']
['a', 'A', 'G', 'g']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'a', 'G']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'g']
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
#...
['a', 'G', 'A', 'G']
['a', 'G', 'a', 'G']
['a', 'G', 'a', 'G']
# etc.
Now, this isn't all of them, but as you can see, there are some occurrences that are the same although ordered differently, for example:
['a', 'G', 'A', 'G']
['a', 'A', 'G', 'G']
I would much prefer the latter ordering, so I want to find a way to print all of the combinations with capital letters before lower case, and because 'a' is before 'g', also alphabetically. The final product should look like ['AaGG', 'aaGg', etc]
. What function or functions should I use?
This is the code that generates the data. The section marked "Counting" is what I'm having trouble with.
import itertools
from collections import Counter
parent1 = 'aaGG'
parent2 = 'AaGg'
f1 = []
f1_ = []
genotypes = []
b = []
genetics = []
g = []
idx = []
parent1 = list(itertools.combinations(parent1, 2))
del parent1[0]
del parent1[4]
parent2 = list(itertools.combinations(parent2, 2))
del parent2[0]
del parent2[4]
for x in parent1:
f1.append(''.join(x))
for x in parent2:
f1_.append(''.join(x))
y = list(itertools.product(f1, f1_))
for x in y:
genotypes.append(''.join(x))
break
genotypes = [
thingies[0][0] + thingies[1][0] + thingies[0][1] + thingies[1][1]
for thingies in zip(parent1, parent2)
] * 4
print 'F1', Counter(genotypes)
# Counting
for genotype in genotypes:
alleles = list(itertools.combinations(genotype,2))
del alleles[1]
del alleles[3]
for x in alleles:
g.append(''.join(x))
for idx in g:
if idx.lower().count("a") == idx.lower().count("g") == 1:
break
f2 = list(itertools.product(g, g))
for x in f2:
genetics.append(''.join(x))
for genes in genetics:
if genes.lower().count("a") == genes.lower().count("g") == 2:
genes = ''.join(genes)
print Counter(genes)
I think you're looking for a customized way to define precedence; the lists are currently being ordered by ASCII numbering, which defines uppercase letters as always preceding lowercase letters. I would define customized precedence using a dictionary:
>>> test_list = ['a', 'A', 'g', 'G']
>>> precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
>>> test_list.sort(key=lambda x: precedence_dict[x])
>>> test_list
['A', 'a', 'G', 'g']
Edit: Your last few lines:
for genes in genetics:
if genes.lower().count("a") == genes.lower().count("g") == 2:
genes = ''.join(genes)
print Counter(genes)
were not doing what you wanted them to.
Replace those lines with:
precedence_dict = {'A':0, 'a':1, 'G':2,'g':3}
for i in xrange(len(genetics)):
genetics[i] = list(genetics[i])
genetics[i].sort(key=lambda x: precedence_dict[x])
genetics[i] = ''.join(genetics[i])
from sets import Set
genetics = list(Set(genetics))
genetics.sort()
print genetics
and I think you have the correct solution. When iterating over elements in a for loop, Python makes a copy of the item. So the string 'genes' was actually not being modified in the original list.