Search code examples
pythonstack-overflowsieve-of-eratosthenesprime-factoring

Handling memory usage for big calculation in python


I am trying to do some calculations with python, where I ran out of memory. Therefore, I want to read/write a file in order to free memory. I need a something like a very big list object, so I thought writing a line for each object in the file and read/write to that lines instead of to memory. Line ordering is important for me since I will use line numbers as index. So I was wondering how I can replace lines in python, without moving around other lines (Actually, it is fine to move lines, as long as they return back to where I expect them to be).

Edit

I am trying to help a friend, which is worse than or equal to me in python. This code supposed to find biggest prime number, that divides given non-prime number. This code works for numbers until the numbers like 1 million, but after dead, my memory gets exhausted while trying to make numbers list.

# a comes from a user input
primes_upper_limit = (a+1) / 2
counter = 3L
numbers = list()
while counter <= primes_upper_limit:
    numbers.append(counter)
    counter += 2L

counter=3
i=0
half = (primes_upper_limit + 1) / 2 - 1
root = primes_upper_limit ** 0.5
while counter < root:
    if numbers[i]:
        j = int((counter*counter - 3) / 2)
        numbers[j] = 0
        while j < half:
            numbers[j] = 0
            j += counter
    i += 1
    counter = 2*i + 3
primes = [2] + [num for num in numbers if num]
for numb in reversed(primes):
    if a % numb == 0:
        print numb
        break
Another Edit

What about wrinting different files for each index? for example a billion of files with long integer filenames, and just a number inside of the file?


Solution

  • You want to find the largest prime divisor of a. (Project Euler Question 3) Your current choice of algorithm and implementation do this by:

    1. Generate a list numbers of all candidate primes in range (3 <= n <= sqrt(a), or (a+1)/2 as you currently do)
    2. Sieve the numbers list to get a list of primes {p} <= sqrt(a)
    3. Trial Division: test the divisibility of a by each p. Store all prime divisors {q} of a.
    4. Print all divisors {q}; we only want the largest.

    My comments on this algorithm are below. Sieving and trial division are seriously not scalable algorithms, as Owen and I comment. For large a (billion, or trillion) you really should use NumPy. Anyway some comments on implementing this algorithm:

    1. Did you know you only need to test up to √a, int(math.sqrt(a)), not (a+1)/2 as you do?
    2. There is no need to build a huge list of candidates numbers, then sieve it for primeness - the numbers list is not scalable. Just construct the list primes directly. You can use while/for-loops and xrange(3,sqrt(a)+2,2) (which gives you an iterator). As you mention xrange() overflows at 2**31L, but combined with the sqrt observation, you can still successfully factor up to 2**62
    3. In general this is inferior to getting the prime decomposition of a, i.e. every time you find a prime divisor p | a, you only need to continue to sieve the remaining factor a/p or a/p² or a/p³ or whatever). Except for the rare case of very large primes (or pseudoprimes), this will greatly reduce the magnitude of the numbers you are working with.
    4. Also, you only ever need to generate the list of primes {p} once; thereafter store it and do lookups, not regenerate it. So I would separate out generate_primes(a) from find_largest_prime_divisor(a). Decomposition helps greatly.

    Here is my rewrite of your code, but performance still falls off in the billions (a > 10**11 +1) due to keeping the sieved list. We can use collections.deque instead of list for primes, to get a faster O(1) append() operation, but that's a minor optimization.

    # Prime Factorization by trial division
    
    from math import ceil,sqrt
    from collections import deque
    
    # Global list of primes (strictly we should use a class variable not a global)
    #primes = deque()
    primes = []
    
    def is_prime(n):
        """Test whether n is divisible by any prime known so far"""
        global primes
        for p in primes:
             if n%p == 0:
                 return False #  n was divisible by p
        return True # either n is prime, or divisible by some p larger than our list    
    def generate_primes(a):
        """Generate sieved list of primes (up to sqrt(a)) as we go"""
        global primes
        primes_upper_limit = int(sqrt(a))
        # We get huge speedup by using xrange() instead of range(), so we have to seed the list with 2
        primes.append(2)
        print "Generating sieved list of primes up to", primes_upper_limit, "...",
        # Consider prime candidates 2,3,5,7... in increasing increments of 2
        #for number in [2] + range(3,primes_upper_limit+2,2):
        for number in xrange(3,primes_upper_limit+2,2):
            if is_prime(number): # use global 'primes'
                #print "Found new prime", number
                primes.append(number) # Found a new prime larger than our list
        print "done"    
    def find_largest_prime_factor(x, debug=False):
        """Find all prime factors of x, and return the largest."""
        global primes
        # First we need the list of all primes <= sqrt(x)    
        generate_primes(x)
        to_factor = x # running value of the remaining quantity we need to factor
        largest_prime_factor = None
        for p in primes:
            if debug: print "Testing divisibility by", p
            if to_factor%p != 0:
                continue
            if debug: print "...yes it is"
            largest_prime_factor = p
            # Divide out all factors of p in x (may have multiplicity)
            while to_factor%p == 0:
                to_factor /= p
            # Stop when all factors have been found
            if to_factor==1:
                break
        else:
            print "Tested all primes up to sqrt(a), remaining factor must be a single prime > sqrt(a) :", to_factor
        print "\nLargest prime factor of x is", largest_prime_factor
        return largest_prime_factor