Search code examples
pythonalgorithmsetheapdisjoint-sets

Finding the minimum cost for 'm' compatible elements for group 1 and group 2 (algorithm)


Here is a problem statement. I recently had an interview question regarding this:

given arrays compatible1, compatible2, and cost of length n >= 1. cost[i] represents the cost of element i. the respective compatible[i] == 1 if compatible with that group and 0 otherwise. also given min_compatible, which is the minimum number of elements to be compatible for each. for example, if item is compatible with both, it counts toward the quota for both. return the minimum cost to fulfill the compatibility requirement.

My idea was: we either are compatible with one of group1, group2, or both (disjoint). pick the minimum cost that's compatible with both if both still need an element and that one is less than the minimum of the non-intersect ones combined. otherwise, just pick the minimum you need.

My solution looked like this (Python):

from heapq import heappush, heappop

def getMinCost(cost, compatible1, compatible2, min_compatible):
    # set of indices compatible with each
    c1 = {i for i,c in enumerate(compatible1) if c == 1}
    c2 = {i for i,c in enumerate(compatible2) if c == 1}

    # not enough to fulfill reqs
    if len(c1) < min_compatible or len(c2) < min_compatible:
        return -1

    # make disjoint -> one, the other, or both
    c1, c2, c3 = c1 - c2, c2 - c1, c2 & c1

    # fill the disjoint heaps
    heap1 = []
    heap2 = []
    heap3 = []
    for i,c in enumerate(cost):
        if i in c3:
            heappush(heap3, c)
        elif i in c2:
            heappush(heap2, c)
        elif i in c1:
            heappush(heap1, c)

    # handle edge case (one empty early)
    heappush(heap1, float('inf'))
    heappush(heap2, float('inf'))
    heappush(heap3, float('inf'))

    # amount fulfilling each
    a1 = 0
    a2 = 0
    # minimum cost counter
    minCost = 0

    # both are below threshold, prioritize intersection
    while a1 < min_compatible and a2 < min_compatible:

        # if first cond true, advantageous to take the one
        # that fulfills both
        if heap1[0] + heap2[0] >= heap3[0]:
            minCost += heappop(heap3)
            a1 += 1
            a2 += 1
        elif heap1[0] <= heap2[0]:
            minCost += heappop(heap1)
            a1 += 1
        else:
            minCost += heappop(heap2)
            a2 += 1


    # deal with leftovers

    while a1 < min_compatible:

        if heap1[0] <= heap3[0]:
            minCost += heappop(heap1)
        else:
            minCost += heappop(heap3)
            a2 += 1

        a1 += 1

    while a2 < min_compatible:

        if heap2[0] <= heap3[0]:
            minCost += heappop(heap2)
        else:
            minCost += heappop(heap3)
            a1 += 1

        a2 += 1


    return minCost

I succeeded in 8/15 of the test cases, but not all (and the inputs were hidden). Where did I go wrong? I cannot see it and am looking to (1) be aware of my mistake (2) not repeat it in the future.

The errors were all incorrect output.


Solution

  • ...pick the minimum cost that's compatible with both if both still need an element...

    I can't see your test cases so this may or may not be the issue, but you should pick a dual-compatible item even if only one of the minimum requirements still needs to be fulfilled as long as it's more economical than picking a singular-compatible item.

    It seems you do attempt to pick a dual-compatible item when only one category is still needed. However, I believe you're calculating the cost of it incorrectly:

    if heap1[0] <= heap3[0]:

    When you pick a dual item (let's say for category 1), the true cost is the cost of the item minus the most expensive category 2 item that was chosen (because you would be able to knock that item off if you picked the dual one).

    You would need to revise your code keep track of which items were selected (most likely by just adding three more heaps).

    Edit: Your comment really threw me for a loop, and it took me the better part of an hour but I finally came up with a counter-example.

    Consider the follow items and prices:

    Group 1: 10, 20

    Group 2: 30, 90

    Dual: 10, 100

    The minimum required for both groups is 3.

    Your algorithm starts by picking out the first dual for 10, then since the next dual costs 100, it picks the first 1 for 10, then the second 1 for 20.

    Now that group 1 has been fulfilled, we moved on to the third while loop where your algorithm picks the first 2 for 30, which is correct, but then picks the second 2 for 90, but that is incorrect! We should pick the dual for 100, and remove the second 1 for 20, which would result in a total cost of 100 - 20 = 80 instead of 90, but nonetheless fulfilling both groups' minimums.

    I'd like to note that the Achilles heel of your algorithm is that the first while loop only considers the price of the very next item(s); it can't look further into the heap than the first item. That was eventually the key finding a counter-example: your algorithm works fine if it only has to make a single decision in the second/third while, but if it has to make two or more then things can go wrong!

    Adding both group 1 and group 2 items simultaneously during the first while loop should fix the issue. In way of a "proof" that such an algorithm will always produce the minimal solution, consider the following: since all items of the same group provide the same amount of utility (e.g. all group 1 items are identical to each other save for price), the minimum solution will always use only the least expensive items for each group. In other words, if you sort the group 1 items from least to most expensive, there will be some index in the list such that all the items to the left of the index are in the minimum solution, and all the items to the right are not used. Next, notice that since the minimum requirement for both group 1 and 2 and identical, that "dividing index" will be the the same for both group 1 and 2. For example, if the minimum solution contains 4 group 1 items (excluding dual items), then it will also contain 4 group 2 items. In this way, you can think of combining the least expensive group 1 and 2 items into a quasi-dual item, with price equal to the sum of their price. We can do the same with the second least expensive, third, and so on, pairing them together. We can be certain that if one half of the quasi-dual item is in the minimum solution, the other half will be as well, so we can treat the two as a single item. At this point, the algorithm is quite simple: just alternate between picking the cheapest of either the cheapest dual item or cheapest quasi-dual item.

    If one of the groups isn't large enough to supply it's minimum (e.g. if the minimum requirement is 6 but there's only 5 group 2 items), just do the algorithm normally until you run out of quasi-dual items, then pick the cheapest regular dual items.