Search code examples
pythonlistuniquecombinations

How to filter the list by selecting for unique combinations of characters in the elements (Python)?


I have the the following pairs stored in the following list

 sample = [[CGCG,ATAT],[CGCG,CATC],[ATAT,TATA]]

Each pairwise comparison can have only two unique combinations of characters, if not then those pairwise comparisons are eliminated. eg,

   In sample[1]
    C       C
    G       A
    C       T 
    G       C

Look a the corresponding elements in both sub-lists, CC, GA, CT, GC.

Here, there are more than two types of pairs (CC), (GA), (CT) and (GC). So this pairwise comparison cannot occur.

Every comparison can have only 2 combinations out of (AA, GG,CC,TT, AT,TA,AC,CA,AG,GA,GC,CG,GT,TG,CT,TC) ... basically all possible combinations of ACGT where order matters.

In the above example, more than 2 such combinations are found.

However,

   In sample[0]
    C       A
    G       T
    C       A 
    G       T

There are only 2 unique combinations: CA and GT

Thus, the only pairs, that remain are:

output = [[CGCG,ATAT],[ATAT,TATA]]

I would prefer if the code was in traditional for-loop format and not comprehensions

This is a small part of the question listed here. This portion of the question is re-asked, as the answer provided earlier provided incorrect output.


Solution

  • def filter_sample(sample):
        filtered_sample = []
    
        for s1, s2 in sample:
            pairs = {pair for pair in zip(s1, s2)}
            if len(pairs) <= 2:
                filtered_sample.append([s1, s2])
    
        return filtered_sample
    

    Running this

    sample = [["CGCG","ATAT"],["CGCG","CATC"],["ATAT","TATA"]]
    filter_sample(sample)
    

    Returns this

    [['CGCG', 'ATAT'], ['ATAT', 'TATA']]