Search code examples
phpparsingsrt

parsing .srt files


1
00:00:00,074 --> 00:00:02,564
Previously on Breaking Bad...

2
00:00:02,663 --> 00:00:04,393
Words...

i need to parse srt files with php and print the all subs in the file with variables.

i couldn't find the right reg exps. when doing this i need to take the id, time and the subtitle variables. and when printing there musn't be no array() s or etc. must print just the same as in the orginal file.

i mean i must print like;

$number <br> (e.g. 1)
$time <br> (e.g. 00:00:00,074 --> 00:00:02,564)
$subtitle <br> (e.g. Previously on Breaking Bad...)

by the way i have this code. but it doesn't see the lines. it must be edited but how?

$srt_file = file('test.srt',FILE_IGNORE_NEW_LINES);
$regex = "/^(\d)+ ([\d]+:[\d]+:[\d]+,[\d]+) --> ([\d]+:[\d]+:[\d]+,[\d]+) (\w.+)/";

foreach($srt_file as $srt){

    preg_match($regex,$srt,$srt_lines);

    print_r($srt_lines);
    echo '<br />';

}

Solution

  • Here is a short and simple state machine for parsing the SRT file line by line:

    define('SRT_STATE_SUBNUMBER', 0);
    define('SRT_STATE_TIME',      1);
    define('SRT_STATE_TEXT',      2);
    define('SRT_STATE_BLANK',     3);
    
    $lines   = file('test.srt');
    
    $subs    = array();
    $state   = SRT_STATE_SUBNUMBER;
    $subNum  = 0;
    $subText = '';
    $subTime = '';
    
    foreach($lines as $line) {
        switch($state) {
            case SRT_STATE_SUBNUMBER:
                $subNum = trim($line);
                $state  = SRT_STATE_TIME;
                break;
    
            case SRT_STATE_TIME:
                $subTime = trim($line);
                $state   = SRT_STATE_TEXT;
                break;
    
            case SRT_STATE_TEXT:
                if (trim($line) == '') {
                    $sub = new stdClass;
                    $sub->number = $subNum;
                    list($sub->startTime, $sub->stopTime) = explode(' --> ', $subTime);
                    $sub->text   = $subText;
                    $subText     = '';
                    $state       = SRT_STATE_SUBNUMBER;
    
                    $subs[]      = $sub;
                } else {
                    $subText .= $line;
                }
                break;
        }
    }
    
    if ($state == SRT_STATE_TEXT) {
        // if file was missing the trailing newlines, we'll be in this
        // state here.  Append the last read text and add the last sub.
        $sub->text = $subText;
        $subs[] = $sub;
    }
    
    print_r($subs);
    

    Result:

    Array
    (
        [0] => stdClass Object
            (
                [number] => 1
                [stopTime] => 00:00:24,400
                [startTime] => 00:00:20,000
                [text] => Altocumulus clouds occur between six thousand
            )
    
        [1] => stdClass Object
            (
                [number] => 2
                [stopTime] => 00:00:27,800
                [startTime] => 00:00:24,600
                [text] => and twenty thousand feet above ground level.
            )
    
    )
    

    You can then loop over the array of subs or access them by array offset:

    echo $subs[0]->number . ' says ' . $subs[0]->text . "\n";
    

    To show all subs by looping over each one and displaying it:

    foreach($subs as $sub) {
        echo $sub->number . ' begins at ' . $sub->startTime .
             ' and ends at ' . $sub->stopTime . '.  The text is: <br /><pre>' .
             $sub->text . "</pre><br />\n";
    }
    

    Further reading: SubRip Text File Format