Search code examples
c++proof-of-correctness

Looking for a proof on why my algorithm in codeforces works


I'm trying prove the correctness of my algorithm.
This is the problem in codeforces: https://codeforces.com/contest/1428/problem/C

Here's my code in C++ which was accepted:

#include<iostream>
#include<vector>
#include<string>
#include<algorithm>
 
using namespace std;
 
int main()
{
    int num, ans, top;
    string s;
 
    cin >> num;
 
    for (int i = 0; i < num; i++)
    {
        cin >> s;
 
        ans = s.size();
 
        top = 0; // meaning no characters behind
 
        for (int j = 0; j < s.size(); j++)
        {
            if (top == 0) top++;
            else if (s[j] == 'A') top++;
            else { ans -= 2; top--; }
        }
 
        cout << ans << endl;
    }
    return 0;
}

It's easy to see that if the string still contains B as the "middle character", we can always make more deletions. And in order to make this string as short as possible, we will always try to delete B with an A(So, we still have more B(s) to continue the delete process).

So, the idea to solve this problem is simple: While tracing through the string, we have a variable called top which "collects" all the A(s) and the first character of the string(the original one or the string after some deletion when top = 0). And whenever we encounter a B, it's always nice if we have collected at least one A(top >= 2) or we have to reluctantly erase this B with the first character of the string without knowing whether it's an A or B.

An algorithm correctly solves this problem if and only if:

  1. It delete all middle B(s)
  2. It makes the string as short as possible

But how do we prove the correctness of this algorithm? Clearly, my algorithm does the "right thing" when top >= 2. But when top = 1, how do I prove that the behavior of erasing B with the first character of the string without knowing whether it's an A or B is correct?

My further reasoning goes like this:
In other words, are there any other algorithms that erase the same amount of AB(s) without confronting this reluctance?

26/12/2021:

We only consider the case when the "real reluctance" happens. What I mean is when the first character of the string is B.

So let's say, there was such "better" and correct algorithm(call it X). In other words:

  1. X erases all AB(s) that our old algorithm erases.
  2. There exists moments when our old algorithm encounters a "real reluctance" but X finds an AB to erase.

Question: The thing is where do all those AB(s) lie ?

(First character A)A...A B A...A B ... B*(characters of the original string)
-------------------^----------------^
-------------------|----bound Y----|

B* is where the reluctance happens.

What has to be pointed out first is that according to condition 1, X is not allowed to erase a B outside bound Y with an A inside bound Y. Because otherwise, there exists some B inside bound Y that can't be erased which proves X a wrong algorithm. (*)

Returning to our question, if for every such moment, AB lies entirely inside bound Y, then our old algorithm erases all the AB(s) that X does. So, X can not be better.

So, there exists a moment where AB lies outside Y and statisfies (*). But this AB will, surely, soon be erased by our algorithm(this can be easily imagined and proved).

B*...A...B...(this A will collected by top and will be matched with the latter B).
So, X can not be better.

Therefore, our algorithm is correct. There're no other algorithms that will make the string shorter.


Solution

  • Great write-up.

    This may be more commentary than the "formal proof" you might be seeking.

    Here's something to consider: You don't need the ans variable at all. You simply print top when the inner for-loop completes. When the inner for-loop completes, I would assert that ans==top anyway.

    Hence, this simplification:

        cin >> s;
        top = 0; // meaning no characters behind
    
        for (char c : s)
        {
            if ((top == 0) || (c == 'A'))
            {
                top++;
            }
            else
            {
                top--;
            }
        }
        cout << top << endl;
    }
    

    Your algorithm is effectively emulating the count of a parser stack. Each time a char is read from the input string, it gets pushed onto the stack. If the character was a B (and the stack wasn't previously empty), both the B and the character on the stack before it gets popped off. Coding up the same algorithm as a stack:

    stack stk;
    for (char c : s)
    {
        if ((stk.size() == 0) || (c == 'A'))
        {
           stk.push(c);
        }
        else
        {
           stk.pop();  // 'B' was read => pop the previous element off the stack
        }
    }
    
    cout << stk.size() << endl;
    

    Above is the same algorithm as your own code, but we've just replaced top with an actual stack. Hence, the chars still remaining on the stack when the loop completes are the characters in the original string that couldn't get collapsed. That's how we know top==ans in your original code block.

    And if you wanted to print the actual collapsed string instead of just it's length, just reverse the stack

    string temp;
    while (stk.size() > 0)
    {
       temp = string(s.front()) + temp;
       s.pop();
    }
    cout << temp << endl;