Search code examples
regexreplacetagsmatchsrt

Regex add tag to subtitles


I have a subtitle file of a movie, like below:

2
00:00:44,687 --> 00:00:46,513
Let's begin.

3
00:01:01,115 --> 00:01:02,975
Very good.

4
00:01:05,965 --> 00:01:08,110
What was your wife's name?

5
00:01:08,943 --> 00:01:12,366
- Mary.
- Mary, alright.

6
00:01:15,665 --> 00:01:18,938
He seeks the spirit
of Mary Browning.

7
00:01:20,446 --> 00:01:24,665
Mary, we invite you
into our circle.

8
00:01:28,776 --> 00:01:32,834
Mary Browning,
we invite you into our circle.
....

Now I want to match only the actual subtitle text content like,

- Mary.
- Mary, alright.

Or

He seeks the spirit
of Mary Browning.

including the special characters, numbers and/or newline characters they may contain. But I don't want to match the time string and serial numbers.

So basically I want to match all lines that contains numbers and special characters only with alphabets, not numbers and special characters which are alone on other lines like time-string and serial numbers.

How can I match and add tag <font color="#FFFF00">[subtitle text any...]</font> to each subtitle I matched with Regex's help ?

Means like below:

<font color="#FFFF00">He seeks the spirit
of Mary Browning.</font>

Solution

  • Well I just figured out by checking and analysing carefully, the key to match all the subtitle text lines.

    First from any subtitle(.srt) file I have to remove unnecessary "line-feed" characters, i.e. \r.

    Find: \r+
    Replace with:
    

    (nothing i.e. null character)

    Then I just have to match those lines not starting with digits & newlines(i.e. blank lines) at all and then replace them with their own text wrapped around with <font> tag with color values as below:

    Find: ^([^\d^\n].*)
    Replace with: <font color="#FFFF00">\1</font>
    

    (space after colon are just for better presentation and not included in code).

    Hope this helps everyone head-banging with subtitles everyday.