Search code examples
pythonfor-looppython-itertools

Python - building sublist that meets certain criteria from huge number of combinations


Been reading a long time, first time I have not been able to find an answer to something I'm working on.

I have a list of 93 strings which are each 6 characters long. From those 93 strings, I want to identify a set of 20 which all meet a particular criteria relative to the others in the set. While itertools.combinations will give me all possible combinations, not all conditions are worth checking.

For instance if [list[0], list[1], etc] fails because list[0] and list[1] can not be together it doesn't matter what the other 18 strings are, the set will fail every time, and that is a ton of wasted checking.

Currently I have this working with 20 nested for loops, but it seems like there has to be a better/faster way to do it.:

for n1 in bclist:
    building = [n1]
    n2bclist = [bc for bc in bclist if bc not in building]
    for n2 in n2bclist:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        n3bclist = [bc for bc in bclist if bc not in building]
        #insert the additional 19 for loops, with n3 in n3, n4 in n4, etc
        building.remove(n2)

There are print statements in the 20th for loop to alert me if a set of 20 even exists. The for statements at least allow me to skip sets early when the single addition fails, but there is no memory of when larger combinations fail:

For instance [list[0], list[1]] fails, so skip to [list[0], [list[2]] which passes. Next is [list[0], list[2], list[1]] which will fail because 0 and 1 are together again so it will move to [list[0], list[2], list[3]] which may or not pass. My concern is that eventually it will also test:

  • [list[0], list[3], list[2]]
  • [list[2], list[0], list[3]]
  • [list[2], list[3], list[0]]
  • [list[3], list[0], list[2]]
  • [list[3], list[2], list[0]]

All of these combinations will have the same outcome as the previous ones. Basically I trade the devil of itertools.combinations testing all combinations of sets which I know fail because of early values which fail for the devil of for loops which treat order of values as a factor when I do not care about their order. Both methods significantly increase the time it will take for my code to complete.

Any ideas on how to get rid of the devils would be greatly appreciated.


Solution

  • Use your current method, but keep track of indices as well so that in your inner loops you can skip elements you would have already checked:

    bcenum = list(enumerate(bclist))
    for i1, n1 in bcenum:
        building = [n1]
        for i2, n2 in bcenum[i1+1:]:              #this is the start of what gets repeated 19 times
            building.append(n2)
            if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
                building.remove(n2)
                continue
            for i3, n3 in bcenum[i2+1:]:
                # more nested loops
            building.remove(n2)