Search code examples
pythonalgorithmperformanceprocessing-efficiency

Python: how can this code be made to run faster


I am new to Python and I am slowly learning via Codewars. I know this is potentially against the rules but I have an efficiency question.

You are given a list of integers

ls = [100, 76, 56, 44, 89, 73, 68, 56, 64, 123, 2333, 144, 50, 132, 123, 34, 89]

You must write a function choose_best_sum(t, k, ls)

such that you find a combination of k integers from ls such that the sum of those k intergers is as close to or equal to t.

My final solution passes the tests but on the more detailed testing fails perhaps because of efficiency. I am trying to understand efficiency more. Here is my code

import itertools

def choose_best_sum(t, k, ls):
    if sum(sorted(ls)[:k]) > t or len(ls) < k:
       return None
    else:
       combos = itertools.permutations(ls, k)
       return max([[sum(i)] for i in set(combos) if sum(i) <= t])[0]

Could someone highlight where the bottleneck is here (I assume on the permutations call) and how this function could be made faster?

EDIT:

The above solution profiled gave

1806730 function calls in 0.458 seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.457    0.457 <string>:1(<module>)
    1    0.000    0.000    0.457    0.457 exercises.py:14(choose_best_sum)
742561    0.174    0.000    0.305    0.000 exercises.py:19(<genexpr>)
321601    0.121    0.000    0.425    0.000 exercises.py:20(<genexpr>)
    1    0.000    0.000    0.458    0.458 {built-in method builtins.exec}
    1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
    1    0.032    0.032    0.457    0.457 {built-in method builtins.max}
    1    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
742561    0.131    0.000    0.131    0.000 {built-in method builtins.sum}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

using the assistance I got my final solution was:

def choose_best_sum(t, k, ls):
   ls = [i for i in ls if i < t and i < (t - sum(sorted(ls)[:k-1]))]
   if sum(sorted(ls)[:k]) > t or len(ls) < k:
      return None
   else:
      return max(s for s in (sum(i) for i in itertools.combinations(ls, k)) if s <= t)

Ordered by: standard name

7090 function calls in 0.002 seconds

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.003    0.003 <string>:1(<module>)
 2681    0.001    0.000    0.003    0.000 exercises.py:10(<genexpr>)
    1    0.000    0.000    0.003    0.003 exercises.py:5(choose_best_sum)
    1    0.000    0.000    0.000    0.000 exercises.py:6(<listcomp>)
    1    0.000    0.000    0.003    0.003 {built-in method builtins.exec}
    1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
    1    0.000    0.000    0.003    0.003 {built-in method builtins.max}
   17    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
 4385    0.001    0.000    0.001    0.000 {built-in method builtins.sum}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Solution

  • You have a couple of obvious flaws in the expression

    max([[sum(i)] for i in set(combos) if sum(i) <= t])[0]
    
    1. You are running sum(i) twice for no good reason;

    2. You are packing the result into a list ([sum(i)]) and then unpacking it ([0])

    3. You are converting combos to a set for no reason

    Try replacing it with

    sums = [sum(c) for c in combos]
    return max(s for s in sums if s <= t)
    

    Edit: ok, a few ideas on a better algorithm:

    D'oh! First, use itertools.combinations instead of itertools.permutations. You are just taking the sum; the order of items makes no difference. If you are running on ie k = 4, combinations will return 4! == 24 times fewer entries than permutations on the same input data.

    Second, we want to discard as many items as possible from ls at the very start. Obviously we can throw out any value > t; but we can get a tighter bound than that. If we add the (k - 1) smallest values, the largest allowable value must be <= t - (k-1)_sum.

    (If we were looking for an exact sum, we could run this trick in reverse - adding the (k - 1) largest values would give us a minimum allowable value - and we could repeatedly apply those two rules to discard more possibilities. That does not apply here though.)

    Third, we could look at all combinations of (k - 1) values, then use bisect.bisect_left to jump directly to the best possible k'th value. There's a bit of a complication, because you have to double-check that the k'th value has not already been selected as one of the (k - 1) values - you can't do that directly using the built-in itertools.combinations function, but you could use a modified copy of the itertools.combinations code (ie test that bisect_left returns an index higher than the last one currently in use).

    Together these should speed up your code by a factor of len(ls) * k * k!... Good luck!

    Edit 2:

    Let this be a lesson in the dangers of over-optimization :-)

    from bisect import bisect_right
    
    def choose_best_sum(t, k, ls):
        """
        Find the highest sum of `k` values from `ls` such that sum <= `t`
        """
        # enough values passed?
        n = len(ls)
        if n < k:
            return None
    
        # remove unusable values from consideration
        ls = sorted(ls)
        max_valid_value = t - sum(ls[:k - 1])
        first_invalid_index = bisect_right(ls, max_valid_value)
        if first_invalid_index < n:
            ls = ls[:first_invalid_index]
            # enough valid values remaining?
            n = first_invalid_index   # == len(ls)
            if n < k:
                return None
    
        # can we still exceed t?
        highest_sum = sum(ls[-k:])
        if highest_sum <= t:
            return highest_sum
    
        # we have reduced the problem as much as possible
        #   and have not found a trivial solution;
        # we will now brute-force search combinations of (k - 1) values
        #   and binary-search for the best kth value
        best_found = 0
        # n = len(ls)      # already set above
        r = k - 1
        # itertools.combinations code copied from
        #   https://docs.python.org/3/library/itertools.html#itertools.combinations
        indices = list(range(r))
        # Inserted code - evaluate instead of yielding combo
        prefix_sum = sum(ls[i] for i in indices)          #
        kth_index = bisect_right(ls, t - prefix_sum) - 1  # location of largest possible kth value
        if kth_index > indices[-1]:                       # valid with rest of combination?
            total = prefix_sum + ls[kth_index]            #
            if total > best_found:                        #
                if total == t:                            #
                    return t                              #
                else:                                     #
                    best_found = total                    #
        x = n - r - 1    # set back by one to leave room for the kth item
        while True:
            for i in reversed(range(r)):
                if indices[i] != i + x:
                    break
            else:
                return
            indices[i] += 1
            for j in range(i+1, r):
                indices[j] = indices[j-1] + 1
            # Inserted code - evaluate instead of yielding combo
            prefix_sum = sum(ls[i] for i in indices)          #
            kth_index = bisect_right(ls, t - prefix_sum) - 1  # location of largest possible kth value
            if kth_index > indices[-1]:                       # valid with rest of combination?
                total = prefix_sum + ls[kth_index]            #
                if total > best_found:                        #
                    if total == t:                            #
                        return t                              #
                    else:                                     #
                        best_found = total                    #
            else:
                # short-circuit! skip ahead to next level of combinations
                indices[r - 1] = n - 2
    
        # highest sum found is < t
        return best_found