Which approach is cheaper when iterating through a vector to sort it based on a list of strings?

I am writing the elements of a throwaway vector to a file in the order provided by a list of strings.

Given

std::vector<std::string> data {
    "jpg  yuvj420p 1080x2400 1:1   9:20",
    "jpeg yuvj420p 1536x2048 72:72 3:4",
    "jpg  yuvj444p 1920x1080 1:1   16:9",
    "png  rgba     150x150   NA    NA",
 /* ... up to 35 (56?) more, all of which start *
  *     with one of the strings in extensions   */
};
std::vector<std::string> extensions {
    "png", "jpg", "jpeg", // ... 2 to 4 more
};
std::ofstream file ("data.txt");

Straightforward approach:

for (auto& ext : extensions)
{
    for (auto& it : data)
        if (it.starts_with(ext))
            file << it << '\n';
}

Using std::erase_if:

for (auto& ext : extensions)
{
    std::erase_if(
        data,
        [&ext](std::string& it) {
            if (it.starts_with(ext)) {
                file << it << '\n';
                return true;
            } else return false;
        }
    );
}

Result:

// data.txt
png  rgba     150x150   NA    NA
... // more lines starting with 'png'
jpg  yuvj420p 1080x2400 1:1   9:20
jpg  yuvj444p 1920x1080 1:1   16:9
... // more lines starting with 'jpg'
jpeg yuvj420p 1536x2048 72:72 3:4
...

The idea behind the second approach is to avoid iterating over the elements of data which have already been written to file on all subsequent outer loop executions. However, is that worth the overhead due to the call to std::erase_if (with data.size() - number_of_previous_matches applications of the predicate (the lambda), some move operations, and extensions.size() calls to data.erase)?

Solution

let's say you start with N items and P predicates and they are all uniformly distributed, and each item can only be matched by 1 predicate, and that all items will eventually be matched.

first approach is P iterations on N elements, that does N*P predicate calculations, and also N*P element access.

second approach does the following predicate calculations

N + (N-N/P) + (N-2N/P) .... till 0

to simplify by taking N common and using P as common divisor

N * (P/P + (P - 1)/P + (P - 2)/P ... till 0)

the sum is

N * (P * (P + 1)) / 2 / P = N * (P+1) / 2  ... almost (N * P)/2

so in short you will be doing half the predicate evaluations for a very large number of predicates.

on the other hand you will be wasting time moving elements also (N * P)/2 times, so the total number of vector elements access remains the same, you just evaluate the predicate half the time less for a very large number of predicates, for a small number of predicates the difference may not matter as much (because of the +1), and for 1 predicate you end up doing more work.

Please benchmark both approaches for your use-cases because the above analysis is based on a uniform distribution which is not usually correct for real data.