Search code examples
regexnotepad++

Regular Expression to Split Text


I am trying to split text (chess notation) into separate lines for each move. A move is either move number (1.) and move (e4) if it is White to move or just the move (c5) if it is Black to move. This is what I have as an example:

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 
Nf6 5. Nc3 a6 6. h3 e5 7. Nde2 h5 8.
g3 Be6

This is the output I am looking for:

1. e4
c5
2. Nf3
d6
3. d4
cxd4
4. Nxd4
Nf6
5. Nc3
a6
6. h3
 e5
7. Nde2
h5 
8. g3
Be6

I have made some progress in finding a pattern that matches the first part but I am not sure how to do the actual split. Also there are rare cases where there is a part of my pattern in one line and the rest in the next line, e.g. 8.[new line here]g3 instead of 8. g3 which I would match.

[0-9]+\.\s?[A-Za-z0-9]+

This matches move numbers, the dot, the space and the actual move. But then I want to replace the next space and not the actual string. For the Black moves I was trying this

[^0-9][^.][A-Za-z0-9]+

but it keeps matching . e4 (a White move) and not only the Black moves like c5.


Solution

  • It looks like after the number with a dot, there are always two "words". Capture them and re-format the match as you need:

    Find What: (\d+\.)\s+(\w+)\s+(\w+)\s*
    Replace With: $1 $2\n$3\n

    Details:

    • (\d+\.) - Group 1 ($1): one or more digits and a .
    • \s+ - one or more whitespaces
    • (\w+) - Group 2 ($2): one or more word chars
    • \s+ - one or more whitespaces
    • (\w+) - Group 3 ($3): one or more word chars
    • \s* - zero or more whitespaces

    See the demo screenshot:

    enter image description here