I have a subtitle file of a movie, like below:
2
00:00:44,687 --> 00:00:46,513
Let's begin.
3
00:01:01,115 --> 00:01:02,975
Very good.
4
00:01:05,965 --> 00:01:08,110
What was your wife's name?
5
00:01:08,943 --> 00:01:12,366
- Mary.
- Mary, alright.
6
00:01:15,665 --> 00:01:18,938
He seeks the spirit
of Mary Browning.
7
00:01:20,446 --> 00:01:24,665
Mary, we invite you
into our circle.
8
00:01:28,776 --> 00:01:32,834
Mary Browning,
we invite you into our circle.
....
Now I want to match only the actual subtitle text content like,
- Mary.
- Mary, alright.
Or
He seeks the spirit
of Mary Browning.
including the special characters, numbers and/or newline characters they may contain. But I don't want to match the time string and serial numbers.
So basically I want to match all lines that contains numbers and special characters only with alphabets, not numbers and special characters which are alone on other lines like time-string and serial numbers.
How can I match and add tag <font color="#FFFF00">[subtitle text any...]</font>
to each subtitle I matched with Regex's help ?
Means like below:
<font color="#FFFF00">He seeks the spirit
of Mary Browning.</font>
Well I just figured out by checking and analysing carefully, the key to match all the subtitle text lines.
First from any subtitle(.srt
) file I have to remove unnecessary "line-feed" characters, i.e. \r
.
Find: \r+
Replace with:
(nothing i.e. null character)
Then I just have to match those lines not starting with digits & newlines(i.e. blank lines) at all and then replace them with their own text wrapped around with <font>
tag with color values as below:
Find: ^([^\d^\n].*)
Replace with: <font color="#FFFF00">\1</font>
(space after colon are just for better presentation and not included in code).
Hope this helps everyone head-banging with subtitles everyday.