I am trying to match everything up to the last "Saving*" line before "ModelFinish". I can almost do this with negative look-around (described in Regular expression to match a line that doesn't contain a word), but can't get it working with newlines in the string I'm trying to match. I'm using notepad++ and there's a checkbox for ". matches newline"
Input:
Begin: model 17
Epoch 15800, loss 4051304.017, val_PMAE 6.9
Saving at epoch 15828 with loss: 3974847.290
Saving at epoch 15889 with loss: 3968749.471
ModelFinish: Stop training
Begin: model 18
Saving at epoch 15889 with loss: 3968749.223
Saving at epoch 15889 with loss: 3968749.200
Epoch 15800, loss 4051304.017
ModelFinish: Stop training
Begin: model 19
Desired first match:
Begin: model 17
Epoch 15800, loss 4051304.017, val_PMAE 6.9
Saving at epoch 15828 with loss: 3974847.290
Desired second match:
Begin: model 18
Saving at epoch 15889 with loss: 3968749.223
My attempt (with ". matches newline" checked):
^Begin:(?:(?!Saving.*Model).)*$
My plan is to use notepad++ to find-and-replace the text I don't want with "", so that I'm just left with the final "loss" from each model. (Ie: model 17 loss: 3968749.471, model 18 loss: 3968749.200, etc)
You don't have to enable the dot matching the newline if you match the newlines using \R
to match a unicode newline sequence.
To match before the last occurrence of Saving before ModelFinish you could match the lines that don't start with ModelFinish and use positive lookahead (?=
that asserts what follows is a newline and Saving.
^Begin:.*(?:\R(?!ModelFinish).*)*(?=\RSaving)
^
Start of stringBegin:.*
Match Begin:
and any char except a newline 0+ times(?:
Non capturing group
\R(?!ModelFinish)
Match a newline and assert that the line does not start with ModelFinish.*
Match any char except a newline 0+ times)*
Close non capturing group and repeat 0+ times(?=\RSaving)
Positive lookahead, assert what is on the right is a newline followed by Saving