I have a text/subtitle file like below:
1
00:00:58,178 --> 00:00:59,327
Some text!
2
00:00:59,329 --> 00:01:01,819
<i>Some text</i>
3
00:01:40,512 --> 00:01:41,629
2350 some text.
4
00:01:41,631 --> 00:01:43,771
Some text.
Now I have almost figured out, how to match the actual subtitle line by the below regular expression:
^([^\d^\n].*)
But what if the same actual subtitle line starts with digit(third subtitle in example)? So now I have to match also those lines starting with digits only if they later have letters in the same line before line ending.
How can I do that by combining with my above used regular expression?
Update #1
This update is made to bring a huge performance boost
I suppose subtitles can be in multi lines:
^\d+:\d+:[^-]+-->.*\R+\K.+(?:\R.+)*(?=\s*(?:^\d+$|\z))
Explanation:
^\d+:\d+:[^-]+-->.* # Match time's line
\R+\K # One or more newlines (& forget all previous matched characters)
.+ # Match next immediate line
(?:\R.+)* # And continuing lines of subtitle (if any)
(?=\s*(?:^\d+$|\z)) # Up to a digit-only-line or end of input string