Search code examples
pythonalgorithmgreatest-common-divisor

Recursively calculating the GCDs of a sets members


We have a set of positive integers.

We create a new set by calculating the greatest common divisor of all possible pairs of integers from this set.

We redo the above step until only one member remains in the set.

Is there an O(n) method to calculate how many new sets this process creates and whether the member in the last set will be 1?

Some python code that demonstrates what the process I described.

from itertools import combinations
from fractions import gcd
import random

def gen_new_set(a):
    count = 0
    while len(a) != 0:
       print str(count)+':\t'+str(len(a))+'\t'+str(a)
       a = set(gcd(x[0], x[1]) for x in combinations(a,2))  
       count += 1

if __name__ == '__main__':
    a = set( random.sample(range(1,40), 10))
    gen_new_set(a)

Solution

  • No. There are O(n*n) pairs in the first iteration, and it is possible for any one of them to have a prime factor in common that is shared nowhere else. So in general this is a O(n*n) amount of necessary work to tell the difference between the number of sets S being 1 or 2.

    There is, of course, a O(n) algorithm to find the GCD of the whole set which will tell you if the whole set is 1.

    Now when you say "integer", an arbitrary integer can be extremely hard to factorize (a fact which cryptography makes great use of). But if your integers are of reasonable size, then factoring is actually a fairly efficient operation and you have very few prime factors per integer. So as a pre-processing step you can store a relationship of the number to its factorization. Then make a hash table mapping each prime p to the numbers that it divides. And now it is simple to walk through the primes, for each one taking out the subset of things that divide it, then recursively doing the calculation for that one.

    This algorithm should work out to be something like O(n*k + n*s) where n is the number of integers, k is the average cost of factoring them, and s is the number of subsets that you generate. Which for bit n and small integers is more efficient than performing the operation you describe.