Search code examples
algorithmoptimizationpseudocodeparagraph

Minimizing total empty space in a paragraph - algorithm


For an application I want to find a way to minimize, by penalizing, the total empty space at the end of each line of a paragraph. I have a set of words W = [w1, w2, w3, ..., wn] that constitute the text I want the paragraph to contain and gor each word wi I have its corresponding length li. I also know the maximum amount of characters, including spaces, that a line can fit:m. I cannot hiphenate words.

In this situation I have defined a relation that describes the cost of the empty space of a line starting with the word i and finishing with the word j is given by c(i, j) = (m - (j - i) - sum_{k=i}^{k = j}lk)^3. So c(i, j) must be positive, otherwise I need to break the line and if j = n, I don't penalize the empty space of the last line: c(i, n) = 0.

With this parameters, I have found an algorithm that minimizes the cost of each line, before passing to the next one and calculates the total cost. However, minimizing the cost of each line does not necessarily mean that the total cost is also minimized.

Any process I can think of that would minimize the total cost requires a huge number of permutations of the number of words in each line and, hence, cannot be implemented. Any ideas of a viable algorithm to calculate the minimal cost?


Solution

  • Let G be a graph, wherein each vertex V_x_y represents a partial paragraph consisting of y lines that use a total of x words. The graph has an edge from V_x_y to V_z_(y+1) if z > x and words w_(x+1) through w_z fit on a line. Every such edge has cost c(x+1,z), i.e, the cost of the additional line that it represents.

    Now, your problem is to find the least cost path from V_0_0 to a vertex V_n_y that consumes all words.

    You can use Dijkstra's algorithm to find this path in O(n^2 log n) time or better, or A* to find it even faster if you formulate a decent admissible heuristic.