Search code examples
algorithmbinary-searchapproximation

Why does my binary search algorithm miss the optimal solution, and how can it be improved?


I am looking primarily for a theoretical answer, ideally one addressing my misunderstanding (below) about binary search algorithms. Practical examples are welcome as well.

I have a complex function where I want to do a binary search on that function to find the value of n, so that computed_goal matches a given goal, where goal is a previously-given value I am seeking to match. The algorithm below illustrates this.

l = 0, h = 4000; // low, and high bounds for binary search
while (l < h)
{
    n = (l + h) / 2;

    x = vars / n; // vars represents a more complex computation, x depends on n
        
    computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal depends on n
        
    if (computed_goal < goal)
        l = n + 1;
    else
        h = n - 1;

    if (abs(computed_goal - goal) / goal < 0.01) // check for relative error within 1%
        return n; // successfully matched goal value!
    else
        return 0; // iteration failed to match goal to desired precision
}

Now, I think you may already be seeing a problem if you are familiar with binary search algorithm, but I'm not super-verse on them. My problem does not always present itself but sometimes it is as follows:

Say my function gives me the following values for these n:

n = 2998: computed_goal = 967.8732568 (-2.1267432 = computed_goal - goal)
n = 2999: computed_goal = 968.6512540 (-1.3487460)
n = 3000: computed_goal = 969.4294677 (-0.5705323)
n = 3001: computed_goal = 970.2078977 (+0.2078977) <= This is the optimal solution
n = 3002: computed_goal = 970.9865442 (+0.9865442)

Now, suppose l = 2987, and h = 3015. This makes n = 3001. My goal is to match value of 970. Note that n = 3001 is the optimal solution because |computed_goal - goal| is the smallest in magnitude and it is the closest to the goal

However, my algorithm looks at it and goes computed_goal = 970.2078977 < goal = 970 = False, and sets new h to be n - 1 = 3000, thus excluding the optimal solution from any future iterations. It so happens that my algorithm finally incorrectly settles in on n = 2999, two steps away from optimal solution, here's a listing for reference:

   l    h      n    computed_goal    goal
2987 3015 = 3001, 970.207897725643 < 970
2987 3000 = 2993, 963.9865161108   < 970
2994 3000 = 2997, 967.095475944351 < 970
2998 3000 = 2999, 968.651254012028 < 970

Misunderstanding

I understand that I am checking but then excluding the optimal solution and because I'm not backtracking back to it, it gets lost and excluded...

I was told before that you should (always) exclude the endpoints that you've already checked, because you've already checked them, so it does not make sense to include them in future search bounds. So like in this case I've checked 3001, and in any future iterations I exclude that value from further checks. I am guessing that is incorrect, and I am guessing I must include them(?) Can you help me with theory part of this?

Attempt to fix. I should note as well that when I change my if statement to include end points, like so:

   if (computed_goal < goal)
        l = n;
    else
        h = n;

Algorithm keeps executing indefinitely in an infinite look with l = 3000 and h = 3001. I am not sure how to go around that. I could change my while to while(l + 1 < h), which will break the infinite loop, but it looks off to me, and I am concerned that I may be introducing a different error into the code, and I don't have enough insight to say if I should be using the l + 1 < h loop condition.

Function Sketch

enter image description here

Code

function matchGoal(int $goal = 970): int
{
    $l = 0;
    $h = 3900;

    while ($l < $h)
    {
        $n = intdiv($h + $l, 2);
        $x = 8321100 / $n;
        $factor = 5.113233695 * pow($n / 3000, 2);

        $computed_goal = (((-8.16965E-10 * $x + -3.15457E-06) * $x + 0.008437163) * $x + 207.9079475) * $factor - 0.076934117;

        if ($computed_goal < $goal)
            $l = $n + 1;
        else
            $h = $n - 1;
    }

    $relative_error = 100 * abs($computed_goal - $goal) / $goal;
    return ($relative_error < 1) ? $n : 0;
}

// outputs 2999, when 3001 is optimal
print matchGoal() . PHP_EOL; 

Solution

  • The binary search would like a monotonic binary predicate to work.
    A binary predicate is a function with two possible values: true and false.
    A monotonic binary predicate is true up to some point and false after, or vice versa.

    Suppose the problem is "find the X such that value f(X) is closest to Y".
    Further, suppose f(X) is increasing.

    First, we reformulate the problem as one with a binary predicate, for example, "find the smallest X such that f(X) >= Y". Or "find the largest X such that f(X) < Y". Now, these are problems solvable by binary search. With the intent of, after solving this simpler problem, return to the original problem in some simple final step: "check X and X-1" or "check X and X+1", respectively, and pick the best of the two.

    Next, I'd like to show an asymmetric version of binary search, instead of one excluding endpoints from both sides. Binary search doesn't have to work this way, but this may show the method from a different point of view than you already have, which may help understanding.

    Let us solve the "find the largest X such that f(X) < Y" version using binary search. Formally, the value of the binary predicate pred(X) is true if f(X) < Y and false if f(X) >= Y. Now, it would be convenient to maintain a semi-interval [lo, hi), such that pred(lo) = true and pred(hi) = false. Initially, lo is the inclusive initial lower bound, and hi is the exclusive initial upper bound. Check the middle value me = (lo + hi) / 2; round to either side, this version will work either way. If pred(me) = true, then [me, hi) is the new interval. If pred(me) = false, then [lo, me) is the new interval. Proceed until lo + 1 = hi: this means the only point in [lo, hi) is lo, and it is the answer (to our binary search subproblem, not to the original problem).