Search code examples
phpregexregex-group

PHP Regex to Capture mm:ss.uuu AND hh:mm:ss.uuu


I'm trying to capture the timestamps of some caption files with some luck.

I've managed to get an expression that will partially capture hh:mm:ss.uuu as shown in #910 below, but am unable to figure out how to capture the groups if the optional h: or hh: is not present. My work so far is at https://regex101.com/r/4QWySg/1. As you can see, it's only capturing after the first hour is encountered.

Any help is appreciated :)

909
59:48.420 --> 1:00:06.450
THERE SHOULD BE AN OPTION TO UNMUTE DO

910
1:00:06.460 --> 1:00:09.870
YOU SEE A MICROPHONE ICON ANYWHERE ON YOUR TEAMS


(^\d+$\R)?(\d{1,2}(?::\d{2}){2}\.\d{2,3})\s*-->\s*(\d{1,2}(?::\d{2}){2}\.\d{2,3})\R((?:[^\r\n]|\r?\n[^\r\n])*)(?:\r?\n\r?\n|$)

Solution

  • You can simplify your regex and get all matches using this in PHP:

    ((?:\d{1,2}:)?\d{2}:\d{2}\.\d{2,3})\s*-->\s*((?1))\R(.+)
    

    RegEx Demo

    RegEx Details:

    • (: Start capture group #1
      • (?:\d{1,2}:)?: Match optional hour digits followed by :
      • \d{2}:\d{2}\.\d{2,3}: Match mm:ss.uuu part
    • ): End capture group #1
    • \h*-->\h*: Match -> surrounded with optional spaces on both sides
    • ((?1)): recurses the 1st subpattern i.e. match using same pattern as in group #1. Capture this in group #2
    • \R: Match any newline
    • (.+): Match 1+ of any characters in 3rd capture group for caption