Search code examples
pythontime-complexityprimesspace-complexity

What is the difference in space and time complexity between keeping non-primes in a hashset vs. creating an array of Booleans?


I am given an integer n, and told to return all the prime numbers from 1 to n.

I know this question has been answered many times on here, but my question is around the two methods of keeping track of the non-primes. The two methods are:

  1. Create an array of size n, where each index represents the number from 1 to n, and use Boolean (i.e. True or False) to mark each entry if it is not a prime; the array is initially empty, but when we hit a prime, we will mark all multiples of the prime as False (i.e. not a prime) in our array, so that when we get to the number we do not need to 'check' if it is a prime. Otherwise, we will 'check' whether the number is a prime by trying modulos from 0 to that number.

  2. Create a set() and iterate from 0 to n as per 1. and each time we run into a prime store all of its multiples in this set, and for every number from 0 to n, we first check if it is in this set, before testing whether it is a prime or not.

Is there any difference in terms of Time and Space complexity with the above two methods?


Solution

  • TLDR: from the point of big-Oh, they're equivalent both in time and space. In practice, however, there is a dramatic difference.

    Time complexity: we're processing all numbers, setting all multiples in both cases (N/2 + N/3 + N/5+ ...). In both cases, setting a flag takes O(1) time. In both cases, space complexity is O(n).

    The difference is, list takes only a few bytes per boolean. Set also takes the same few bytes to store the value, but also has to store key hashes, maintain collision records, and handle resizes. While it all still boils down to asymptotic O(1) complexity, it has a substantial constant multiplier to account for in practice.