Search code examples
pythonpython-itertoolscartesian-product

Create combinations between lists based on sequence position


I'm trying to create sequences from shuffling conserved motifs in known sequences of proteins.

For example :

seq1 = ['ABC', 'DEF', 'GHI']
seq2 = ['JKL', 'MNO', 'PUR']
seq3 = ['QRS', 'TUV' 'WXY']

The result I am looking for is :

ABC DEF PUR
ABC DEF WXY 
ABC MNO GHI
ABC MNO WXY
ABC MNO PUR 
JKL MNO GHI 
JKL MNO WXY 
JKL DEF GHI 
JKL DEF WXY 
JKL DEF PUR 
...

for a total of 3^3 combinations.

I already tried using functions in the itertools module (combinations, product, etc.) and nothing gives the desired result.

I am new to programming there may be something very obvious that I am missing...


Solution

  • If I understand correctly, you are looking for the Cartesian product of the grouped elements of each list. For this, we can use itertools.product after zipping the sequences together.

    In[1]: from itertools import product, izip
    In[2]: seq1 = ['ABC', 'DEF', 'GHI'] 
    In[3]: seq2 = ['JKL', 'MNO', 'PUR']
    In[4]: seq3 = ['QRS', 'TUV', 'WXY']
    In[5]: list(product(*izip(seq1, seq2, seq3)))
    Out[5]: 
    [('ABC', 'DEF', 'GHI'),
     ('ABC', 'DEF', 'PUR'),
     ('ABC', 'DEF', 'WXY'),
     ('ABC', 'MNO', 'GHI'),
     ('ABC', 'MNO', 'PUR'),
     ('ABC', 'MNO', 'WXY'),
     ('ABC', 'TUV', 'GHI'),
     ('ABC', 'TUV', 'PUR'),
     ('ABC', 'TUV', 'WXY'),
     ('JKL', 'DEF', 'GHI'),
     ('JKL', 'DEF', 'PUR'),
     ('JKL', 'DEF', 'WXY'),
     ('JKL', 'MNO', 'GHI'),
     ('JKL', 'MNO', 'PUR'),
     ('JKL', 'MNO', 'WXY'),
     ('JKL', 'TUV', 'GHI'),
     ('JKL', 'TUV', 'PUR'),
     ('JKL', 'TUV', 'WXY'),
     ('QRS', 'DEF', 'GHI'),
     ('QRS', 'DEF', 'PUR'),
     ('QRS', 'DEF', 'WXY'),
     ('QRS', 'MNO', 'GHI'),
     ('QRS', 'MNO', 'PUR'),
     ('QRS', 'MNO', 'WXY'),
     ('QRS', 'TUV', 'GHI'),
     ('QRS', 'TUV', 'PUR'),
     ('QRS', 'TUV', 'WXY')]