Search code examples
regexregex-lookarounds

Regular expression to match everything up to string


I am trying to match everything up to the last "Saving*" line before "ModelFinish". I can almost do this with negative look-around (described in Regular expression to match a line that doesn't contain a word), but can't get it working with newlines in the string I'm trying to match. I'm using notepad++ and there's a checkbox for ". matches newline"

Input:

Begin: model 17
Epoch 15800, loss 4051304.017, val_PMAE 6.9
Saving at epoch 15828 with loss: 3974847.290
Saving at epoch 15889 with loss: 3968749.471
ModelFinish: Stop training
Begin: model 18
Saving at epoch 15889 with loss: 3968749.223
Saving at epoch 15889 with loss: 3968749.200
Epoch 15800, loss 4051304.017
ModelFinish: Stop training
Begin: model 19

Desired first match:

Begin: model 17
Epoch 15800, loss 4051304.017, val_PMAE 6.9
Saving at epoch 15828 with loss: 3974847.290

Desired second match:

Begin: model 18
Saving at epoch 15889 with loss: 3968749.223

My attempt (with ". matches newline" checked):

^Begin:(?:(?!Saving.*Model).)*$

My plan is to use notepad++ to find-and-replace the text I don't want with "", so that I'm just left with the final "loss" from each model. (Ie: model 17 loss: 3968749.471, model 18 loss: 3968749.200, etc)


Solution

  • You don't have to enable the dot matching the newline if you match the newlines using \R to match a unicode newline sequence.

    To match before the last occurrence of Saving before ModelFinish you could match the lines that don't start with ModelFinish and use positive lookahead (?= that asserts what follows is a newline and Saving.

    ^Begin:.*(?:\R(?!ModelFinish).*)*(?=\RSaving)
    
    • ^ Start of string
    • Begin:.* Match Begin: and any char except a newline 0+ times
    • (?: Non capturing group
      • \R(?!ModelFinish) Match a newline and assert that the line does not start with ModelFinish
      • .* Match any char except a newline 0+ times
    • )* Close non capturing group and repeat 0+ times
    • (?=\RSaving) Positive lookahead, assert what is on the right is a newline followed by Saving

    Regex demo