Why is memoization of collision-free sub-chains of Collatz chains slower than without memoization?

I've written a program to benchmark two ways of finding "the longest Collatz chain for integers less than some bound".

The first way is with "backtrack memoization" which keeps track of the current chain from start till hash table collision (in a stack) and then pops all the values into the hash table (with incrementing chain length values).

The second way is with simpler memoization that only memoizes the starting value of the chain.

To my surprise and confusion, the algorithm that memoizes the entirety of the sub-chain up until the first collision is consistently slower than the algorithm which only memoizes the starting value.

I'm wondering if this is due to one of the following factors:

Is Python really slow with stacks? Enough that it offsets performance gains
Is my code/algorithm bad?
Is it simply the case that, statistically, as integers grow large, the time spent revisiting the non-memoized elements of previously calculated Collatz chains/sub-chains is asymptotically minimal, to the point that any overhead due to popping elements off a stack simply isn't worth the gains?

In short, I'm wondering if this unexpected result is due to the language, the code, or math (i.e. the statistics of Collatz).

import time

def results(backtrackMemoization, start, maxChainValue, collatzDict):
  print()
  print(("with " if backtrackMemoization else "without ") + "backtracking memoization")
  print("length of " + str(collatzDict[maxChainValue[0]]) + " found for n = " + str(maxChainValue[0]))
  print("computed in " + str(round(time.time() - start, 3)) + " seconds")

def collatz(backtrackMemoization, start, maxChainValue, collatzDict):
  for target in range(1, maxNum):
    n = target
    if (backtrackMemoization):
      stack = []
    else:
      length = 0

    while (n not in collatzDict):
      if (backtrackMemoization):
        stack.append(n)
      else:
        length = length + 1
      if (n % 2):
        n = 3 * n + 1
      else:
        n = n // 2
    
    if (backtrackMemoization):
      additionalLength = 1
      while (len(stack) > 0):
        collatzDict[stack.pop()] = collatzDict[n] + additionalLength
        additionalLength = additionalLength + 1
    else:
      collatzDict[target] = collatzDict[n] + length

    if (collatzDict[target] > collatzDict[maxChainValue[0]]):
      maxChainValue[0] = target

def benchmarkAlgo(maxNum, backtrackMemoization):
  start = time.time()
  maxChainValue = [1]
  collatzDict = {1:0}
  
  collatz(backtrackMemoization, start, maxChainValue, collatzDict)
  
  results(backtrackMemoization, start, maxChainValue, collatzDict)

try:
  maxNum = int(input("enter upper bound> "))
  print("setting upper bound to " + str(maxNum))
except:
  maxNum = 100000
  print("defaulting upper bound to " + str(maxNum))

benchmarkAlgo(maxNum, True)
benchmarkAlgo(maxNum, False)

Solution

There is a tradeoff in your code. Without the backtrack memoization, dictionary lookups will miss about twice as many times as when you use it. For example, if maxNum = 1,000,000 then the number of missed dictionary lookups is

without backtrack memoization: 5,226,259
with backtrack memoization: 2,168,610

On the other hand, with backtrack memoization, you are constructing a much bigger dictionary since you are collecting lengths of chains not only for the target values, but also for any value that is encountered in the middle of a chain. Here is the final length of collatzDict for maxNum = 1,000,000:

without backtrack memoization: 999,999
with backtrack memoization: 2,168,611

There is a cost of writing to this dictionary that many more times, popping all these additional values from the stack, etc. It seems that in the end, this cost outweighs the benefits of reducing dictionary lookup misses. In my tests, the code with backtrack memoization run about 20% slower.

It is possible to optimize backtrack memoization, to keep the dictionary lookup misses low while reducing the cost of constructing the dictionary:

Let the stack consist of tuples (n, i) where n is as in your code, and i is the length of the chain traversed up to this point (i.e. i is incremented at every iteration of the while loop). Such a tuple is put on the stack only if n < maxNum. In addition, keep track of how long the whole chain gets before you find a value that is already in the dictionary (i.e. of the total number of iterations of the while loop).
The information collected in this way will let you construct new dictionary entries from the tuples that were put on the stack.

The dictionary obtained in this way will be exactly the same as the one constructed without backtrack memoization, but it will be built in a more efficient way, since a key n will be added when it is first encountered. For this reason, dictionary lookup misses will be still much lower than without backtrack memoization. Here are the numbers of misses I obtained for maxNum = 1,000,000:

without backtrack memoization: 5,226,259
with backtrack memoization: 2,168,610
with optimized backtrack memoization: 2,355,035

For larger values of maxNum the optimized code should run faster than without backtrack memoization. In my tests it was about 25% faster for maxNum >= 1,000,000 .