Search code examples
algorithmfactoring

Is there an algorithm to find the nearest number with only small factors?


I need to do some real time DFT and the algorithm I am using is the most efficient when the number of samples can be broken down into small factors.

Suppose that I have a number n and factors 2, 3, 5. How to find the nearest number (compared to n) whose prime factorisation consists of no numbers other than 2,3,5?

n is almost always below 50,000 so brute forcing might be a good idea.


Solution

  • A fast algorithm for pairs of factors

    This doesn't quite solve the problem as stated -- given some integer x, it will only find the nearest greater-or-equal number that has no factors besides 2 and 3 (or any other given pair of factors). But I think it's cute and since it runs in O(log x) time and O(1) space it might be useful regardless! It's similar in concept to the Bresenham line algorithm. In pseudocode:

    1. Start with b = y = 2^RoundUp(log2(x)), which ensures that b = y >= x.
    2. If y < x then set y = y * 3 and go to 2.
    3. If y < b then set b = y. (b records the best candidate so far.)
    4. If y is odd then stop and return b.
    5. Otherwise, set y = y / 2 and go to 2.

    Correctness

    Why does this work? Notice that at all times, we have y = 2^i*3^j for some i, j >= 0, and that over time, i only ever decreases, while j only ever increases. The invariant we maintain on entry to step 2 is "Any value z = 2^a*3^b having a > i or b < j is known to be uninteresting (i.e., invalid or no better than some already-discovered solution), and so doesn't need to be considered". This is clearly true the first time we arrive at step 2, since y is a power of 2 and already >= x, so any number z = 2^a*3^b with a > i would then be at least 2*y (regardless of b) which is worse than y; and no integer z can have fewer than the j = 0 powers of 3 in y.

    (Another way to state this invariant is "Either we have already found the optimal solution, or it is some number z = 2^a*3^b with a <= i and b >= j.")

    If the condition at step 2, "y < x", is satisfied, then y = 2^i*3^j is not a valid solution as it's too low. More strongly, it's clear that for any a <= i, 2^a*3^j cannot be a valid solution either, as any such solution is at least as low as y. So now we know (from the invariant) that any pair (a, b) satisfying (a > i OR b < j) is uninteresting, and we know from the "y < x" test that just succeeded that any pair (a, b) satisfying (a <= i AND b = j) is uninteresting too. Now consider any pair (a, b) satisfying the slightly different condition (a > i OR b < j+1): if (a, b) doesn't satisfy the first condition (from the invariant) for uninterestingness then we have ((a > i OR b < j+1) AND !(a > i OR b < j)), which simplifies to ((a > i OR b < j+1) AND (a <= i AND b >= j)) via De Morgan's rule, and then to (b < j+1 AND a <= i AND b >= j) (because making (a <= i AND b >= j) true requires (a <= i) to be true, forcing (a > i) to be false and thus allowing its elimination from the OR), which is clearly the same as (a <= i AND b = j) -- but that is exactly the condition for which we have just established a second kind of uninterestingness, via the success of the "y < x" test. So this establishes that any pair (a, b) satisfying (a > i OR b < j+1) is uninteresting -- which, after incrementing j, becomes exactly the invariant we want to preserve.

    The justification for decrementing i in step 5 is almost the same, but in reverse, so I won't go into detail on it. The slight difference is that if we get to step 5, instead of having an invalid solution, we simply have a solution that is at least as high as the best solution in b (notice that we updated b if necessary so that this would continue to hold), and so it and every higher solution is (from this point on) uninteresting to us.

    Each value of y generated by the algorithm either has one less factor of 2 or one more factor of 3 than any previously generated value, so it's clear that all generated y values are distinct. The reasoning in the earlier paragraphs establishes that the only y values not generated are those that have been proven to be uninteresting. Thus if the algorithm always halts in a finite time, it is correct. The next section will imply that this is indeed the case.

    Running time

    Step 5 (which has the effect of decreasing i by 1) never executes more than log2(x)+1 times, since i starts at this value or less, nothing else affects i, and when i reaches 0, y will be odd, causing step 4 to terminate the function. But how many times can the condition in step 2 that increases j by 1 fire?

    Initially y >= x, so achieving the y < x condition needed for step 2 to fire requires an execution of step 5. But as soon as y < x is achieved by some execution of step 5, it is immediately undone at the next execution of step 2: this is because in order for step 5 to execute at all, we must have had y >= x, and if we divide y by 2 (at that step 5) and then multiply it by 3 (at the next step 2), it will necessarily be at least as large as it was previously, implying y >= x again, in turn implying that step 2 will never fire twice in a row without an execution of step 5 in between. So step 2 will fire at most as many times as step 5 does, i.e. at most log2(x)+1 times. This bounds the overall runtime of the algorithm at O(log x).

    Notes

    • You can avoid floating-point arithmetic by replacing the log2() in step 1 with a loop that starts y at 1 and keeps doubling it until it is >= x. This is O(log x) and so doesn't hurt the time complexity.
    • You can use any other pair of factors. The only real change is that, if f is the factor "replacing" 2 in the code, then step 4 should instead halt when y % f != 0.