Search code examples
pythondictionarycounterprobability

Python: Efficiently expand Counter keys with defined list and refresh


I'm expanding a list of unique combinations with replacements, and their frequencies, and though I've seen more elegant solutions using itertools, the approaches are not fast enough or require too much memory when combinations become massive. Below, I've taken a bottoms-up approach which significantly reduces elements in the list.

The answer here is a great approach, but paring down to uniques becomes unwieldy when the iterator is too large: Alternative answer with iterator

For example, with 5 objects and 15 samples, there are 30 billion combos itertools will create, but only 3,876 ( Cr(5,15) ) unique combos. In the next round, the below will build 3,876*5 new combos and pull previous counts to create Cr(5,16). This is MUCH faster than paring down from 30billion iterations.

However, I feel like my code is not efficient at all and it begins to take a significant amount of time much earlier than I'd expect (when uniques are say, 50k total) at Cr(5,30). An iterator at that point would be 9e+20 combo vs building up from the 45k uniques from previous round.

Below is what I'm using to brute force through it; Is there a way to efficiently extend each Counter key by the same new choices (e.g., 0:5), sort, recount? The loop below will be fed with a seed counter with an initial set of tuples and counts I saved from a previous run.

round_list = previously_loaded_counter
start_time = time.time()
#expanding build
for i in range(round_count - 1):  ## Number of rounds added
    loop_time = time.time()
    print(f"Round: {i + 1}")
    round_list.append(collections.Counter())
    for item, count in round_list[i].items():  ## e.g., item = (1, 1), count = 1
        for choice in range(choice_count):
            next_item = item + (choice,)  ## new tuple, (1, 1, 0)
            next_item_sorted = tuple(sorted(next_item))  ## clearly could combine with above, (0, 1, 1)
            round_list[i + 1] += collections.Counter({next_item_sorted: count})  ## fill counter with new tuple, use previous iterations count as seed
    print(f"Loop time: {round(time.time() - loop_time, 2)} seconds")
    print(f"Record count: {len(round_list[i+1])}")

The gist of output should take (where (0,1) and (1,0) are not unique):

Counter({(0, 0): 1, (0, 1): 2, (1, 1): 1)})

and output:

Counter({(0, 0, 0): 1, (0, 0, 1): 3, (0, 1, 1): 3, (1, 1, 1): 1)})

As you can see, it just added 0 to each, then 1 to each, and regrouped them.

Any thoughts would be super appreciated! It all works, I just feel like I've added a ton of unnecessary loops...


Solution

  • I've managed to come up with a fairly efficient solution that runs fairly quickly

    choices = (0, 1, 2, 3, 4)
    repeat = 40
    counter = Counter()
    counter.update({(x, ): 1 for x in choices})
    
    for i in range(1, repeat):
        new_counter = Counter()
        for sorted_combination, previous_count in (
            (tuple(sorted(k + (choice, ))),  v)
            for k, v in counter.items()
            for choice in choices
        ):
            new_counter[sorted_combination] += previous_count
        counter = new_counter
    
    print(counter.most_common(10))