I am studying the Boyer-Moore Algorithm (from here) and I had a quick question - what is the need of the second pass (which essentially just 'confirms' by finding the frequency of that element). Doesn't the first pass itself guarantee that the element found is the majority one? I considered a few examples and felt that a single pass is enough. Could you kindly provide some examples to counter my feeling?
The code (if required) is as below:
int majorityElement(vector<int>& nums) {
int candidate=0, count=0;
for(auto value: nums) {
//update the candidate if the count == 0
if(count==0)
candidate=value;
//if the value == candidate then increment count
if(value==candidate)
count++;
else
//decrement count
count--;
}
//return candidate
return candidate;
}
Edit: If I understand correctly, the algorithm is only applicable when the frequency of the majority element is indeed greater than (vector size())/2
. So, is the second pass really required? Whenever we code, we do some trivial sanity checks (like checking if the input vector is empty), so in this case, why do we have a 'sanity check' as a part of an algorithm? Or is there something more to it?
I think the following intuition for the Boyer-Moore algorithm might shed some light on why two passes are necessary.
The algorithm is based on the following idea. Imagine each element of your array is a person in a room holding a card with a number on it. Each person in the room wanders around until they meet someone else. If the two people are holding different numbers, they each sit down. Otherwise, they keep moving around until they encounter someone else. Eventually, some number of people will be left standing.
If there is a true majority element, the group of the last people standing will definitely have the majority element because no matter how people pair off there’s too many people in the majority for all of them to have been eliminated. But if there isn’t a majority, there could still be someone left standing at the end holding a non-majority element. For example, maybe they just happened to have not encountered anyone with a different value while everyone else sat down.
The reason for the second pass is to differentiate between these two cases. If there is a majority at all, it has to end up as the final candidate. If there isn’t, something might still end up as the final candidate and you need to rule that case out.