Search code examples
pythonregexlistmaxlength

Conditional statment regarding various regex and length of a list in python


I have following list :

  ['E12.2', 'E16.1', 'E15.1']
  ['E10.1', 'I11.2', 'I10.1_27353757']
  ['E16.1', 'E18.1', 'E17.3']
  ['E1.8', 'I12.1_111682336', 'I12.1_111682195']
  ['E55.1', 'E57.1', 'E56.1','E88.1']
  ['U22.3', 'U22.6_13735517', 'U23.1']

and I want to put a condition to filter out the lists that have a) length equal to 3 b) not containing '_' c) not containing alphabet 'U' I am trying to implement in one line, how do I do that? I have following condition working and I know you can use regex module for matching regex in lists but can I do all the conditions in single line?

 if(len(fin_list) == 3) 

Solution

  • This is one possible way:

    lists = [['E12.2', 'E16.1', 'E15.1'],
             ['E10.1', 'I11.2', 'I10.1_27353757'],
             ['E16.1', 'E18.1', 'E17.3'],
             ['E1.8', 'I12.1_111682336', 'I12.1_111682195'],
             ['E55.1', 'E57.1', 'E56.1','E88.1'],
             ['U22.3', 'U22.6_13735517', 'U23.1']]
    
    for lst in lists:
        if len(lst) != 3 and not any('_' in item or 'U' in item for item in lst):
            print(lst)
    
    # Output:
    # ['E55.1', 'E57.1', 'E56.1', 'E88.1']
    

    The interesting bit here is the use of any over a generator expression. To break it down, this iterates over each item in lst and applies a test to see if _ or U are in it. That list comprehension results in True/False for each item in the list. any then looks for the first True. If it finds one, it immediately returns True. If it doesn't find one, it returns False.

    EDIT

    Okay, we've clearly moved into the "Just because you can doesn't mean you should," territory, but here's a solution that incorporates the new condition introduced in the comments:

    from collections import Counter
    import re
    
    lists = [['E12.2', 'E16.1', 'E15.1'],
             ['E10.1', 'I11.2', 'I10.1_27353757'],
             ['E16.1', 'E18.1', 'E17.3'],
             ['E1.8', 'I12.1_111682336', 'I12.1_111682195'],
             ['E55.1', 'E57.1', 'E56.1','E88.1'],
             ['U22.3', 'U22.6_13735517', 'U23.1'],
             ['E7.2', 'E9.5', 'E9.3']]
    
    for lst in lists:
        if (len(lst) != 3 and not any('_' in item or 'U' in item for item in lst) and
                (Counter(match.groups(1) for match in [re.match(r'E(\d+)\.', item) for item in lst] if match is not None)
                 .most_common(1) or [(None, 1)])[0][1] == 1):
            print(lst)
    
    # Output:
    # ['E55.1', 'E57.1', 'E56.1', 'E88.1']
    

    Counter counts things, re.match tries to find the numbers after Es, and the .most_common(1) or [(None, 1)] is to make sure that even if there are no matching elements, we can still index into the result and look for the greatest number of occurrences.

    Although the earlier code was okay, this is now terrible code and should be moved out to another function instead. :-)