Search code examples
c#arraysalgorithmprimes

In a sorted array of prime numbers arr find index i of the smallest number in arr such that arr[i] divides given number


Consider the following data:
arr = [2,3,5,7,11,13,17,19]
n = 100
Output of the given should be 0 since n % 2 = 0 and arr[0] = 2.

Since array is sorted I want to use that to my advantage and design an algorithm that can work in a complexity smaller than O(n), which you have by iterating over the whole array and finding first i such that n % arr[i] = 0.

I instantly thought of using modified Binary Search with the same complexity as the regular Binary Search(O(logN)), which does the same as a regular Binary Search which divides the search interval in half in each iteration with slight modification. Here is the implementation i used:

public int SmallestFactorIndex(int n)
{
    int left = 0;
    int right = max;
    int index = -1;

    while (left <= right)
    {
        int middle = (right - left) / 2 + left;
        int middleValue = sieve[middle];

        if (n % middleValue == 0)
        {
            index = middle;
            right = middle - 1;
        }
        else if (middleValue < n)
        {
            left = middle + 1;
        }
        else
        {
            right = middle - 1;
        }
    }

    return index;
}

Max represents length of the array - 1 and sieve is the sorted array of prime numbers. Only addition to the Binary Search is that if the current number does divide n, I dont instantly return that index but continue searching for a smaller number than that current number by moving the right border to the middle one. All other parts work pretty much the same way. I dont think I can take advantage of the fact that the array only consists of the prime numbers but I might be wrong.


Solution

  • i want to make this a log(N) operation

    O(log n) is commonly associated with a binary algorithm: you split the data set at some midpoint; based on this midpoint you then decide which half to keep and which to discard, and repeat with the kept half until you zero in on the result.

    This works great when the data is presorted (it is, yay) and an item can be known (on it's own!) to definitely solve the query. And there's our trouble. If n is, say, 286 (2 x 11 x 13), then any of 2, 11, and 13 are potential solutions, and landing on any of them via binary search doesn't tell us we're done until we fully complete the search... we still have to check everything.

    Abstracting to the general case, this suggests a simple binary search on the array is not good enough by itself, because simply finding A solution doesn't stop the search for THE solution.

    But it can work if we know the result in advance. That is, if we can solve to find the smallest prime factor (regardless of index) in O(log n) time — if we know the value we're looking for is the 2, instead of the 11 or the 13 — we can then also search the array in O(log n) time, and O(log n) plus O(log n) is still O(log n).


    If we look at the input, fully factoring a number is definitely NOT O(log n) -- famously so, to the point a number of important cryptographic algorithms rely on this being much worse.

    But we don't need to fully factor the number. We only need the first (smallest) prime. This can obviously do a little better... but is it good enough? And unfortunately, I think the answer is still, "no".

    Reading a quick refresher, I think it's likely we can beat linear time for the average/typical case, and even make it all the way to O(sqrt n) to find the first prime. O(sqrt n0) + O(log n1) is still O(sqrt n), but at least it's still better (on the surface) than O(n).


    But this is where things get a little tricky. Now we also have to worry about keeping the "n" from the input variable separate from the "n" for the primes array. When we do that, we realize the "n" for the input variable involves is every integer up to that value, rather than just the (much smaller) length of the pre-computed primes array.

    Therefore, for typical n inputs up to about a million or so (and that number is a complete guess, but you could write some code to validate and refine it), the linear search is likely the best you can do.


    Speaking of writing code to pre-validate inputs, this brings up the next option:

    O(1)

    If you can put some kind of reasonable boundary on the possible input data, it's not out of the question to fully pre-compute the results for every candidate integer and put those values into a O(1) dictionary. Even a million candidates would only take about 8 MB RAM, and suddenly now every input only requires a dictionary lookup.