Algorithm to group consecutive words minimizing length per group

From an input of space-delimited words, how to concatenate consecutive words so that:

each group has a minimum length L (spaces don't count)
longest group length is minimal (spaces don't count)

Example input:

would a cat eat a mouse

Example minimum length:

L = 5

Naive algorithm that solves the first condition but not the second one:

while length of a group is less than L, concatenate next word to group
if last group is shorter than L, concatenate last two groups together

This naive algorithm produces:

group 1: would
group 2: acateat
group 3: amouse
longest group length: 7

Second condition is not solved because a better solution would be:

group 1: woulda
group 2: cateat
group 3: amouse
longest group length: 6

Which algorithm would solve the second condition (minimal longest group) with relatively fast execution as a program? (by fast, I'd like to avoid testing all possible combinations)

I know C, ObjC, Swift, Javascript, Python, but pseudocode is fine.

Solution

This can be done with dynamic programming approach. Let's count a function F(i) - the minimum length of the longest group among correct divisions of the first i words into groups.

F(0) = 0
F(i) = Min(Max(F(j), totalLen(j+1, i))), for j in [0..i-1]

Where

totalLen(i, j) = total length of words from i to j, if the length is at least L 
totalLen(i, j) = MAX, if total length is less than L

The answer is the value of F(n). To get the groups themselves we can save the indices of the best j for every i.

There is a implementation from the scratch in c++:

const vector<string> words = {"would", "a", "cat", "eat", "a", "mouse"};
const int L = 5;
int n = words.size();

vector<int> prefixLen = countPrefixLen(words);

vector<int> f(n+1);
vector<int> best(n+1, -1);
int maxL = prefixLen[n];
f[0] = 0;

for (int i = 1; i <= n; ++i) {
    f[i] = maxL; 
    for (int j = 0; j < i; ++j) {
        int totalLen = prefixLen[i] - prefixLen[j];
        if (totalLen >= L) {
            int maxLen = max(f[j], totalLen);
            if (f[i] > maxLen) {
                f[i] = maxLen;
                best[i] = j;
            }
        }
    } 
}

output(f[n], prev, words);

Preprocessing and output details:

vector<int> countPrefixLen(const vector<string>& words) {
    int n = words.size();
    vector<int> prefixLen(n+1);
    for (int i = 1; i <= n; ++i) {
        prefixLen[i] = prefixLen[i-1] + words[i-1].length();
    } 
    return prefixLen;
}

void output(int answer, const vector<int>& best, const vector<string>& words) {
    cout << answer << endl;

    int j = best.size()-1;
    vector<int> restoreIndex(1, j);
    while (j > 0) {
        int i = best[j];
        restoreIndex.push_back(i);
        j = i;
    }
    reverse(restoreIndex.begin(), restoreIndex.end());
    for (int i = 0; i+1 < restoreIndex.size(); ++i) {
        for (int j = restoreIndex[i]; j < restoreIndex[i+1]; ++j) {
            cout << words[j] << ' ';
        }
        cout << endl;
    }
}

Output:

6
would a 
cat eat 
a mouse

Runnable: https://ideone.com/AaV5C8

Further improvement

The complexity of this algorithm is O(N^2). If it is too slow for your data I can suggest a simple optimization:

Let's inverse the inner loop. First, this allows to get rid of the prefixLen array and it's preprocessing, because now we add words one by one to the group (actually, we could get rid of this preprocessing in the initial version, but at the expense of simplicity). What is more important we can break our loop when totalLen would be not less than already computed f[i] because further iterations will never lead to an improvement. The code of the inner loop could be changed to:

int totalLen = 0;
for (int j = i-1; j >= 0; --j) {
    totalLen += words[j].length();
    if (totalLen >= L) {
        int maxLen = max(f[j], totalLen);
        if (f[i] > maxLen) {
            f[i] = maxLen;
            best[i] = j;
        }
    }
    if (totalLen >= f[i]) break;
}

This can drastically improve the performance for not very big values of L.