Search code examples
pythonregexfilesrt

How to do text matching in .srt file and get the timestamp of the line in which the text exists


The return value should be the start time of that sentence.

import re

key = input("ENTER THE KEY PHRASE")
file = open('tcs.srt','r')

for line in file.readlines():
    if re.search(r'^%s'%key, line, re.I):
        print(line)

for example :

SERACH key : milestone

to be found in: 0:01:25,299 --> 0:01:31,099 one of the significant milestones and great momentum in many of the areas that

0:01:25,299 should be returned in seconds


Solution

  • .srt files contains timestamps and subtitles. The time format is hours:minutes:seconds,milliseconds. Here is the function that returns the first timestamp in hours:minutes:seconds,milliseconds --> hours:minutes:seconds,milliseconds in seconds.

    import re
    
    def return_seconds(line):
        timeValues = line[:line.index("-->")].strip().replace(",",":").split(":")
        timeValues = list(map(int, timeValues))
        hours_to_seconds = timeValues[0] * 3600
        minutes_to_seconds = timeValues[1] * 60
        seconds = timeValues[2]
        milliseconds_to_seconds = round(timeValues[3]/1000, 2)
        total_seconds = hours_to_seconds + minutes_to_seconds + seconds + milliseconds_to_seconds
        return total_seconds
    
    key = input("ENTER THE KEY PHRASE")
    file = open('tcs.srt','r')
    
    previousLine = ""
    
    for line in file.readlines():
        if key in line:
            print("Starting seconds at line is {}".format(return_seconds(previousLine)))
        previousLine = line