Search code examples
pythonsequenceprimesmemoizationcollatz

finding the lowest collatz sequence that gives more that 65 primes


I have a task where I need to find the lowest Collatz sequence that contains more than 65 prime numbers in Python.

For example, the Collatz sequence for 19 is:

19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1

This sequence contains 7 prime numbers.

I also need to use memoization so it doesn't have to run a "year" to find it. I found code for memoization of Collatz sequences, but I can't figure out how to get it to work when I need only the prime numbers.

Here is the Collatz memoization code that I found:

lookup = {}
def countTerms(n):
    if n not in lookup:
        if n == 1:
            lookup[n] = 1
        elif not n % 2:
            lookup[n] = countTerms(n / 2)[0] + 1
        else:
            lookup[n] = countTerms(n*3 + 1)[0] + 1

    return lookup[n], n

And here is my tester for prime:

def is_prime(a):
    for i in xrange(2,a):
        if a%i==0:
            #print a, " is not a prime number"
            return False
    if a==1:
        return False
    else:
        return True

Solution

  • Your existing code is incorrectly indented. I assume this task is a homework task, so I won't post a complete working solution, but I'll give you some helpful snippets.

    First, here's a slightly more efficient primality tester. Rather than testing if all numbers less than a are factors of a, it just tests up to the square root of a.

    def is_prime(a):
        for i in xrange(2, int(1 + a ** 0.5)):
            if a % i == 0:
                return False
        return True
    

    Note that this function returns True for a = 1. That's ok, since you don't need to test 1: you can pre-load it into the lookup dict:

    lookup = {1:0}
    

    Your countTerms function needs to be modified slightly so that it only adds one to the lookup count when the current n is prime. In Python, False has a numeric value of 0 and True has a numeric value of 1. That's very handy here:

    def count_prime_terms(n):
        ''' Count the number of primes terms in a Collatz sequence '''
        if n not in lookup:
            if n % 2:
                next_n = n * 3 + 1
            else:
                next_n = n // 2
    
            lookup[n] = count_prime_terms(next_n) + is_prime(n)
        return lookup[n]
    

    I've changed the function name to be more Pythonic.

    FWIW, the first Collatz sequence containing 65 or more primes actually contains 67 primes. Its seed number is over 1.8 million, and the highest number that requires primality testing when checking all sequences up to that seed is 151629574372. At completion, the lookup dict contains 3920492 entries.


    In response to James Mills's comments regarding recursion, I've written a non-recursive version, and to make it easy to see that the iterative and the recursive versions both produce the same results I'm posting a complete working program. I said above that I wasn't going to do that, but I figure that it should be ok to do so now, since spørreren has already written their program using the info I supplied in my original answer.

    I fully agree that it's good to avoid recursion except in situations where it's appropriate to the problem domain (eg, tree traversal). Python discourages recursion - it cannot optimize tail-call recursion and it imposes a recursion depth limit (although that limit can be modified, if desired).

    This Collatz sequence prime counting algorithm is naturally stated recursively, but it's not too hard to do iteratively - we just need a list to temporarily hold the sequence while the primality of all its members are being determined. True, this list takes up RAM, but it's (probably) much more efficient space-wise than the stack frame requirements that the recursive version needs.

    The recursive version reaches a recursion depth of 343 when solving the problem in the OP. This is well within the default limit but it's still not good, and if you want to search for sequences containing much larger numbers of primes, you will hit that limit.

    The iterative & recursive versions run at roughly the same speed (at least, they do so on my machine). To solve the problem stated in the OP they both take a little under 2 minutes. This is significantly faster than my original solution, mostly due to optimizations in primality testing.

    The basic Collatz sequence generation step already needs to determine if a number is odd or even. Clearly, if we already know that a number is even then there's no need to test if it's a prime. :) And we can also eliminate tests for even factors in the is_prime function. We can handle the fact that 2 is prime by simply loading the result for 2 into the lookup cache.

    On a related note, when searching for the first sequence containing a given number of primes we don't need to test any of the sequences that start at an even number. Even numbers (apart from 2) don't increase the prime count of a sequence, and since the first odd number in such a sequence will be lower than our current number its results will already be in the lookup cache, assuming we start our search from 3. And if we don't start searching from 3 we just need to make sure our starting seed is low enough so that we don't accidentally miss the first sequence containing the desired number of primes. Adopting this strategy not only reduces the time needed, it also reduces the number of entries in the lookup` cache.

    #!/usr/bin/env python
    
    ''' Find the 1st Collatz sequence containing a given number of prime terms
    
        From http://stackoverflow.com/q/29920691/4014959
    
        Written by PM 2Ring 2015.04.29
    
        [Seed == 1805311, prime count == 67]
    '''
    
    import sys
    
    def is_prime(a):
        ''' Test if odd `a` >= 3 is prime '''
        for i in xrange(3, int(1 + a ** 0.5), 2):
            if not a % i:
                return 0
        return 1
    
    
    #Track the highest number generated so far; use a list
    # so we don't have to declare it as global...
    hi = [2]
    
    #Cache for sequence prime counts. The key is the sequence seed,
    # the value is the number of primes in that sequence.
    lookup = {1:0, 2:1}
    
    def count_prime_terms_iterative(n):
        ''' Count the number of primes terms in a Collatz sequence 
            Iterative version '''
        seq = []
        while n not in lookup:
            if n > hi[0]:
                hi[0] = n
    
            if n % 2:
                seq.append((n, is_prime(n)))
                n = n * 3 + 1
            else:
                seq.append((n, 0))
                n = n // 2
    
        count = lookup[n]
        for n, isprime in reversed(seq):
            count += isprime
            lookup[n] = count
    
        return count
    
    def count_prime_terms_recursive(n):
        ''' Count the number of primes terms in a Collatz sequence
            Recursive version '''
        if n not in lookup:
            if n > hi[0]:
                hi[0] = n
    
            if n % 2:
                next_n = n * 3 + 1
                isprime = is_prime(n)
            else:
                next_n = n // 2
                isprime = 0
            lookup[n] = count_prime_terms(next_n) + isprime
    
        return lookup[n]
    
    
    def find_seed(numprimes, start):
        ''' Find the seed of the 1st Collatz sequence containing
            `numprimes` primes, starting from odd seed `start` '''
        i = start
        mcount = 0
    
        print 'seed, prime count, highest term, dict size'
        while mcount < numprimes:
            count = count_prime_terms(i)
            if count > mcount:
                mcount = count
                print i, count, hi[0], len(lookup)
            i += 2
    
    
    #count_prime_terms = count_prime_terms_recursive
    count_prime_terms = count_prime_terms_iterative
    
    def main():
        if len(sys.argv) > 1:
            numprimes = int(sys.argv[1])
        else:
            print 'Usage: %s numprimes [start]' % sys.argv[0]
            exit()
    
        start = int(sys.argv[2]) if len(sys.argv) > 2 else 3 
    
        #Round `start` up to the next odd number
        if start % 2 == 0:
            start += 1
    
        find_seed(numprimes, start)
    
    
    if __name__ == '__main__':
        main()
    

    When run with

    $ ./CollatzPrimes.py 65

    the output is

    seed, prime count, highest term, dict size
    3 3 16 8
    7 6 52 18
    19 7 160 35
    27 25 9232 136
    97 26 9232 230
    171 28 9232 354
    231 29 9232 459
    487 30 39364 933
    763 32 250504 1626
    1071 36 250504 2197
    4011 37 1276936 8009
    6171 43 8153620 12297
    10971 44 27114424 21969
    17647 48 27114424 35232
    47059 50 121012864 94058
    99151 51 1570824736 198927
    117511 52 2482111348 235686
    202471 53 17202377752 405273
    260847 55 17202377752 522704
    481959 59 24648077896 966011
    963919 61 56991483520 1929199
    1564063 62 151629574372 3136009
    1805311 67 151629574372 3619607