Search code examples
c++sortinglambda

Why does sorting strings by length give incorrect results?


I want to sort a vector of strings first by their length and then in alphabetical order. My quick initial implementation:

#include <algorithm>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main() {

    vector<string> words {"zzz", "aaazzz", "zz", "a"};

    sort(words.begin(), words.end(), [](auto& w1, auto& w2) {
        if (w1.length() < w2.length()) {
            return true;
        }
        return w1 < w2;
    });

    for (auto& w : words) {
        cout << w << endl;
    }

    return 0;
}

Suddenly gives me the wrong result:

a
zz
aaazzz
zzz

When I change the lambda condition like this:

sort(words.begin(), words.end(), [](auto& w1, auto& w2) {
    if (w1.length() == w2.length()) {
        return w1 < w2;
    }
    return w1.length() < w2.length();
});

the result is ok:

a
zz
zzz
aaazzz

Cannot figure out what's wrong with my initial condition. Why it doesn't produce the desired result when the 1st statement in it is quite unambiguously, imho:

    if (w1.length() < w2.length()) {
        return true;
    }

UPD:

  • have replaced screenshots with text
  • thank you all, I've seen the point

Solution

  • I'm pretty sure this:

        if (w1.length() < w2.length()) {
            return true;
        }
        return w1 < w2;
    

    Should be:

        if (w1.length() < w2.length()) {
            return true;
        }
        if (w1.length() > w2.length()) {
            return false;
        }
        return w1 < w2;
    

    You only want w1 < w2 to applied a "tie-breaker" when the length of the two strings are the same.

    You could possibly optimize out the redundant calls to length(), but I bet the compiler would do that anyway in a release build. But if you want to just have shorter code:

        auto len1 = w1.length();
        auto len2 = w2.length();
        return (len1 != len2) ? (len1 < len2) : (w1 < w2);