Search code examples
regexcrossword

Regular expression for crossword solution


This is a crossword problem. Example:

  • the solution is a 6-letter word which starts with "r" and ends with "r"
  • thus the pattern is "r....r"
  • the unknown 4 letters must be drawn from the pool of letters "a", "e", "i" and "p"
  • each letter must be used exactly once
  • we have a large list of candidate 6-letter words

Solutions: "rapier" or "repair".

Filtering for the pattern "r....r" is trivial, but finding words which also have [aeip] in the "unknown" slots is beyond me.

Is this problem amenable to a regex, or must it be done by exhaustive methods?


Solution

  • Try this:

    r(?:(?!\1)a()|(?!\2)e()|(?!\3)i()|(?!\4)p()){4}r
    

    ...or more readably:

    r
    (?:
      (?!\1) a () |
      (?!\2) e () |
      (?!\3) i () |
      (?!\4) p ()
    ){4}
    r
    

    The empty groups serve as check marks, ticking off each letter as it's consumed. For example, if the word to be matched is repair, the e will be the first letter matched by this construct. If the regex tries to match another e later on, that alternative won't match it. The negative lookahead (?!\2) will fail because group #2 has participated in the match, and never mind that it didn't consume anything.

    What's really cool is that it works just as well on strings that contain duplicate letters. Take your redeem example:

    r
    (?:
      (?!\1) e () |
      (?!\2) e () |
      (?!\3) e () |
      (?!\4) d ()
    ){4}
    m
    

    After the first e is consumed, the first alternative is effectively disabled, so the second alternative takes it instead. And so on...

    Unfortunately, this technique doesn't work in all regex flavors. For one thing, they don't all treat empty/failed group captures the same. The ECMAScript spec explicitly states that references to non-participating groups should always succeed.

    The regex flavor also has to support forward references--that is, backreferences that appear before the groups they refer to in the regex. (ref) It should work in .NET, Java, Perl, PCRE and Ruby, that I know of.