Search code examples
regexpcreregex-lookaroundssubtitlegeany

Regular expression to match a line starting with a digit only if the same line contains letters after


I have a text/subtitle file like below:

1
00:00:58,178 --> 00:00:59,327
Some text!

2
00:00:59,329 --> 00:01:01,819
<i>Some text</i>

3
00:01:40,512 --> 00:01:41,629
2350 some text.

4
00:01:41,631 --> 00:01:43,771
Some text.

Now I have almost figured out, how to match the actual subtitle line by the below regular expression:

^([^\d^\n].*)

But what if the same actual subtitle line starts with digit(third subtitle in example)? So now I have to match also those lines starting with digits only if they later have letters in the same line before line ending.

How can I do that by combining with my above used regular expression?


Solution

  • Update #1

    This update is made to bring a huge performance boost

    I suppose subtitles can be in multi lines:

    ^\d+:\d+:[^-]+-->.*\R+\K.+(?:\R.+)*(?=\s*(?:^\d+$|\z))
    

    Explanation:

    ^\d+:\d+:[^-]+-->.*     # Match time's line
    \R+\K                   # One or more newlines (& forget all previous matched characters)
    .+                      # Match next immediate line
    (?:\R.+)*               # And continuing lines of subtitle (if any)
    (?=\s*(?:^\d+$|\z))     # Up to a digit-only-line or end of input string
    

    Live demo