Search code examples
pandasdataframematrixiterationcombinations

Using one dataframe to find matching combinations in fixed sets


A B C D E
Key 1 1 -1
Key 2 1 -1
Key 3 1 -1
Key 4 -1 1
Key 5 1 -1
Key 6 1 -1
Key 7 1 -1
Key 8 1 -2 1

Final Result

A B C D E
1 -1

suppose we have the above dataframe where each key is an option used to create a combination to get to the desired Final Result. Suppose that you can also specify max number of combinations it can use to achieve the final result below, how would one iterate through the dataframe and when a set of combinations equals the final result, it prints all the keys that make up the combo as well as number of combos it took?

For example, let's say the maximum number of combinations is 3-key combo. Then the following combinations will satisfy both the Final Result and stay under or equal to the number of key combos allowed to achieve it

Key 2 (itself), combos 1

Key 4 + Key 5, combos 2

Key 3 + Key 8, combos 2


Solution

  • Assuming:

    df = pd.DataFrame({
        'A': [1, None, None, -1, 1, None, None, None],
        'B': [-1, 1, None, 1, None, 1, None, 1],
        'C': [None, -1, 1, None, -1, None, 1, -2],
        'D': [None, None, -1, None, None, -1, None, 1],
        'E': [None, None, None, None, None, None, -1, None]
    }, index=['Key 1', 'Key 2', 'Key 3', 'Key 4', 'Key 5', 'Key 6', 'Key 7', 'Key 8'])
    
    final = pd.Series([None, 1, -1, None, None], index=['A', 'B', 'C', 'D', 'E'])
    

    You could use itertools to produce the combinations, and reindex+sum to compare to the expected output:

    # helper function to produce the combinations
    from itertools import combinations, chain
    def powerset(iterable, max_set=None):
        s = list(iterable)
        if max_set is None:
            max_set = len(s)
        return chain.from_iterable(combinations(s, r) for r in range(1, max_set+1))
    
    # loop over the combinations 
    # reindex and sum
    # compare to final with nulls as 0
    MAX_N = 3
    for c in powerset(df.index, MAX_N):
        if df.reindex(c).sum().eq(final, fill_value=0).all():
            print(c)
    

    Output:

    ('Key 2',)
    ('Key 3', 'Key 8')
    ('Key 4', 'Key 5')
    ('Key 1', 'Key 2', 'Key 4')
    

    Note that this produces Key1/Key2/Key4 as a valid combination since Key1/Key4 cancel themselves and Key2 alone is valid.

    To avoid this, you could keep track of the produced combinations and only retain those that are not a superset of already seen valid combinations:

    MAX_N = 3
    seen = set()
    for c in powerset(df.index, MAX_N):
        if df.reindex(c).sum().eq(final, fill_value=0).all():
            f = frozenset(c)
            if not any(f > s for s in seen):
                seen.add(f)
                print(c)
    

    Output:

    ('Key 2',)
    ('Key 3', 'Key 8')
    ('Key 4', 'Key 5')