Search code examples
algorithmknapsack-problemgreedynp-hardcrossword

How to solve crossword (NP-Hard)?


I am currently doing an assignment and I'm stuck with the approach.

I have a crossword problem which consists of an empty grid (no solid square as a conventional crossword would), with a varied width and height between 4 and 400 (inclusive).

Rules:

  1. Words are part of the input - a list of 10 - 1000 (inclusive) English words of varying lengths.
  2. A horizontal word can only intersect a vertical word.
  3. A vertical word can only intersect a horizontal word.
  4. A word can only intersect 1 or 2 other words.
  5. Each letter is worth one point.
  6. Words must have a 1 grid space gap surrounding it unless it is a part of an intersecting word.

Example:

X X X X X X  
X B O S S X  
X X X X X X  

Goal:
Get the maximum possible score within a 5 minute time limit.

So far:
After some research I am aware that this is an NP-Hard problem. Thus the most optimal solution cannot be calculated because every combination cannot be examined.

The easiest solution would appear to be to sort the words according to length and inserting the highest scoring words for maximum score (greedy algorithm).

I’ve also been told a recursive tree with the nodes consisting of alternative equally scoring word insertions and the knapsack algorithm apply to this problem (not sure what the implementation would look like).

Questions:

  1. What allows me to check the maximum number of combinations within a 5 minute time span that scales accordingly to the maximum possible word list and grid size?
  2. What heuristics might I apply when inserting words?

Btw the goal here is to get the best possible solution in 5 minutes. To clarify each letter of a valid word is worth 1 point, thus a 5 letter word is worth 5 points.

Thanks in advance I have been reading a lot of mathematical notation on crossword research papers all day which has seem to have lead me in a circle.


Solution

  • I'd start with a word with following characteristics:

    • It should have max possible intersections.
    • Its length should be such that number of words of that length are minimum in the list.

    ie, word length should be least frequent and most number of intersections.
    Reason for this kind of selection is that it would minimize further possibility of words that can be selected. eg. A word of size 9 with 2 further intersections is selected. These intersecting words are of length 6 and 5 (say). Now, you have removed possibility of all those words of length 6 and 5 whose 3rd char is 'a' and 2nd char is 's' (say, 'a' and 's' are the intersecting letters).

    If there are many places with same configuration, run this selection procedure one or two steps deeper to get a better selection of which part (word) of the grid to fill first.

    Now, try filling in all words in this 1st selected position (since this had min frequency, it should be good to use) and then going deeper in the crossword to fill it. Whichever word results in most points till a deadend is reached, should be your solution. When you reach a dead-end, you can start over with a new word.