Search code examples
rubyregexchess

Regex that does NOT match something I matched before


As part of a question I asked earlier today, my goal is to validate all the moves a rook can make in chess notation.

This consists of:

  • The letter R
  • An optional disambiguation, the source of the problem (discussed in detail later)
  • An optional x to indicate a capture was made
  • The square to which the rook moved (the columns ["files" in chess] are lettered a-h and the rows ["ranks"] are numbered 1-8)

Disregarding disambiguation, we have the simple

/Rx?[a-h][1-8]/

Disambiguation

It often happens that two rooks can move to a square, and one does. When this happens, a disambiguating letter or number is used. So, if two rooks are on d3 and h5, and the one on h5 moves to d5, it is written Rhd5. Similarly, a rook on d8 moving to d3 when another rook is on d1 is written R8d3.

Files take precedence over ranks. In the first example, if the rook on d3 moved to d5, it could be disambiguated as R3d5 or Rdd5. Only the latter is correct.

The limits on rook disambiguation are:

  • Any letter may be used for file disambiguation, and
  • Any number may be used for rank disambiguation, but the number of the square moved to must not be 1 or 8 (R3d1 is not valid because of files' precedence over ranks and should be Rdd1), and it must not be the same number as the number of the square (R3d3 is also invalid)

With the above in mind, I constructed this:

/R([a-h]?x?[a-h][1-8]|([1-8])x?[a-h][2-7&&[^\1]])/

The problem lies in the last characters, [2-7&&[^\1]]. Ruby interprets [^\1] literally, that is as all characters other than \ or 1. If I try putting the \1 outside the brackets ([2-7&&[^]\1]), Ruby complains about the character class with no elements. And if I use an arbitrary placeholder that will never occur, say "z" ([2-7&&[^z]\1]), it doesn't work (I can't explain why)

So how can I use grouping to NOT match what I matched before?


Solution

  • Your question is long and dense, so I will address the core question and let you implement the technique:

    How can I use grouping to NOT match what I matched before?

    We'll proceed step by step. The following is not an exact chess example, but an illustration of how to accomplish what you want.

    1. Let's say I want a string that matches letters a through h. My regex is ^[a-h]$
    2. Next I want to match a digit and a dash. My regex becomes ^[a-h][0-9]-$
    3. Next I want to match a letter, but not the one we matched before. My regex becomes ^([a-h])[0-9]-(?!\1)[a-h]$, where the ([a-h]) captures the first letter to Group 1, and the negative lookahead (?!\1) asserts that what follows is not the content of what was matched by Group 1 (i.e., it is not that letter).
    4. Let's add a final digit just for balance: ^([a-h])[0-9]-(?!\1)[a-h][0-9]$. This will match a1-b2 but not a1-a2.

    Let me know if you have any questions.

    Reference