I'm trying to create sequences from shuffling conserved motifs in known sequences of proteins.
For example :
seq1 = ['ABC', 'DEF', 'GHI']
seq2 = ['JKL', 'MNO', 'PUR']
seq3 = ['QRS', 'TUV' 'WXY']
The result I am looking for is :
ABC DEF PUR
ABC DEF WXY
ABC MNO GHI
ABC MNO WXY
ABC MNO PUR
JKL MNO GHI
JKL MNO WXY
JKL DEF GHI
JKL DEF WXY
JKL DEF PUR
...
for a total of 3^3 combinations.
I already tried using functions in the itertools module (combinations
, product
, etc.) and nothing gives the desired result.
I am new to programming there may be something very obvious that I am missing...
If I understand correctly, you are looking for the Cartesian product of the grouped elements of each list. For this, we can use itertools.product
after zipping the sequences together.
In[1]: from itertools import product, izip
In[2]: seq1 = ['ABC', 'DEF', 'GHI']
In[3]: seq2 = ['JKL', 'MNO', 'PUR']
In[4]: seq3 = ['QRS', 'TUV', 'WXY']
In[5]: list(product(*izip(seq1, seq2, seq3)))
Out[5]:
[('ABC', 'DEF', 'GHI'),
('ABC', 'DEF', 'PUR'),
('ABC', 'DEF', 'WXY'),
('ABC', 'MNO', 'GHI'),
('ABC', 'MNO', 'PUR'),
('ABC', 'MNO', 'WXY'),
('ABC', 'TUV', 'GHI'),
('ABC', 'TUV', 'PUR'),
('ABC', 'TUV', 'WXY'),
('JKL', 'DEF', 'GHI'),
('JKL', 'DEF', 'PUR'),
('JKL', 'DEF', 'WXY'),
('JKL', 'MNO', 'GHI'),
('JKL', 'MNO', 'PUR'),
('JKL', 'MNO', 'WXY'),
('JKL', 'TUV', 'GHI'),
('JKL', 'TUV', 'PUR'),
('JKL', 'TUV', 'WXY'),
('QRS', 'DEF', 'GHI'),
('QRS', 'DEF', 'PUR'),
('QRS', 'DEF', 'WXY'),
('QRS', 'MNO', 'GHI'),
('QRS', 'MNO', 'PUR'),
('QRS', 'MNO', 'WXY'),
('QRS', 'TUV', 'GHI'),
('QRS', 'TUV', 'PUR'),
('QRS', 'TUV', 'WXY')]