python dynamic-programming memoization coin-change

Why two seemingly similar implementations of the same DP algorithm differ?

I'm implementing a dynamic programming (DP) solution to an "unbounded knapsack"-family problem. To be more precise, the problem is to find all possible change for the given amount having a set of n coins (an infinite number of each kind of coin). Here is the leetcode link just in case https://leetcode.com/problems/coin-change-ii/

Two approaches that, in theory, should have the same big-O time complexity (O(amount*n)) perform differently.

In essence, the two implementations differ by recursive tree: in one case it's a tree with branching for every coin but, as I assume, less deep (or have a smaller height); in the second case, we have a binary recursive tree but it's deeper. Both the implementations are below.

The first one:

bigO = 0

class Solution:
    def dfs(self, amount: int, coins: List[int], starting_from: int) -> int:
        global bigO
        bigO += 1
        if amount < 0 or starting_from == len(coins):
            return 0
        if amount == 0:
            return 1

        if self.dp[starting_from][amount] != -1:
            return self.dp[starting_from][amount]

        all_changes = 0
        for i in range(starting_from, len(coins)):
            all_changes += self.dfs(amount-coins[i], coins, i)

        self.dp[starting_from][amount] = all_changes
        return all_changes

    def change(self, amount: int, coins: List[int]) -> int:
        self.dp = [[-1]*(amount+1) for _ in range(len(coins))]
        return self.dfs(amount, coins, 0)

amount = 4000
coins = ... # given below for readability
print(Solution().change(amount, coins))  # outputs 3435
print(bigO)  # outputs 24794885

The second one:

class Solution:
    def dfs(self, amount: int, coins: List[int], i: int) -> int:
        global bigO
        bigO += 1
        if amount < 0 or i == len(coins):
            return 0
        if amount == 0:
            return 1

        if self.dp[i][amount] != -1:
            return self.dp[i][amount]

        all_changes = self.dfs(amount-coins[i], coins, i) + self.dfs(amount, coins, i+1)

        self.dp[i][amount] = all_changes
        return all_changes

    def change(self, amount: int, coins: List[int]) -> int:
        self.dp = [[-1]*(amount+1) for _ in range(len(coins))]
        return self.dfs(amount, coins, 0)


amount = 4000
coins = ... # given below for readability
print(Solution().change(amount, coins))  # outputs 3435
print(bigO)  # outputs 1143881

Note that I include "bigO" variable to count actual number of invocations of the method. The results I got differ by one order of magnitude as you can see. BTW the first impl even hits "Time limit exceeded" if you try to submit it on leetcode, while the second one works just fine.

Can anyone, please, explain the cause of the difference in actual performance? Am I missing something assuming that both the implementations, in theory, have the same time complexity?

I give the coins variable value below for better readability of the question (it must be at least ~200 items long to make a difference):

coins = [200,217,234,251,268,285,302,319,336,353,370,387,404,421,438,455,472,489,
    506,523,540,557,574,591,608,625,642,659,676,693,710,727,744,761,778,795,812,
    829,846,863,880,897,914,931,948,965,982,999,1016,1033,1050,1067,1084,1101,
    1118,1135,1152,1169,1186,1203,1220,1237,1254,1271,1288,1305,1322,1339,1356,
    1373,1390,1407,1424,1441,1458,1475,1492,1509,1526,1543,1560,1577,1594,1611,
    1628,1645,1662,1679,1696,1713,1730,1747,1764,1781,1798,1815,1832,1849,1866,
    1883,1900,1917,1934,1951,1968,1985,2002,2019,2036,2053,2070,2087,2104,2121,
    2138,2155,2172,2189,2206,2223,2240,2257,2274,2291,2308,2325,2342,2359,2376,
    2393,2410,2427,2444,2461,2478,2495,2512,2529,2546,2563,2580,2597,2614,2631,
    2648,2665,2682,2699,2716,2733,2750,2767,2784,2801,2818,2835,2852,2869,2886,
    2903,2920,2937,2954,2971,2988,3005,3022,3039,3056,3073,3090,3107,3124,3141,
    3158,3175,3192,3209,3226,3243,3260,3277,3294,3311,3328,3345,3362,3379,3396,
    3413,3430,3447,3464,3481,3498,3515,3532,3549,3566,3583,3600,3617,3634,3651,
    3668,3685,3702,3719,3736,3753,3770,3787,3804,3821,3838,3855,3872,3889,3906,
    3923,3940,3957,3974,3991,4008,4025,4042,4059,4076,4093,4110,4127,4144,4161,
    4178,4195,4212,4229,4246,4263,4280,4297,4314,4331,4348,4365,4382,4399,4416,
    4433,4450,4467,4484,4501,4518,4535,4552,4569,4586,4603,4620,4637,4654,4671,
    4688,4705,4722,4739,4756,4773,4790,4807,4824,4841,4858,4875,4892,4909,4926,
    4943,4960,4977,4994]

Thanks to Kira's answer below it gets clearer, thank you Kira!

The part I still don't understand is that in the second case, yes, the recursive tree has a lower branching factor, but it's deeper. For example, for the list [a, b, c], note that the [c, c, c] combination using binary approach we reach at the 5th level, while in the first case, at the 3rd. And the longer the list, the bigger the difference. I sketched the trees below :)

The second case

                                        root
                            /                       \
                    [a]                                 []
                /         \                       /            \
            [a, a]          [a]             [b]                     []
            /    \        /     \          /   \                 /    \
    [a, a, a]   [a, a] [a, b]   [a]    [b, b]  [b]          [c]         []
                                                            /   \
                    .....                               [c, c]  [c]
                                                        /
                                                    [c, c, c]

and for the first one:

                                                root
                    /                            |                                \
            [a]                                 [b]                                 [c]
        /     |        \                         |         \                         | 
[a, a]      [a, b]      [a, c]                 [b, b]       [b, c]                 [c, c]
                            ..............                                           |   
                                                                                 [c, c, c]

What's the best explanation of the fact that the second case with a lower branching factor but bigger 'depth' is not comparable (in terms of performance) with the first one with bigger branching factor but smaller 'depth'?

After all, both enumerate the same amount of combinations of given items. I would appreciate if someone helps me to understand!

Solution

Two approaches that, in theory, should have the same big-O time complexity (O(amount*n)) perform differently.

This is not true, as in the first case, your DP transition has a time complexity of O(len(coins)) at worst. Meanwhile, in the second case, your DP transition is just O(1) at worst.

In general, whenever calculating time complexity for a recursive DP, it's a good idea to just multiply the range of memoized arguments you have and the time to complexity of transition states to get a good estimate of the overall time complexity.