I'm trying to capture the timestamps of some caption files with some luck.
I've managed to get an expression that will partially capture hh:mm:ss.uuu as shown in #910 below, but am unable to figure out how to capture the groups if the optional h:
or hh:
is not present. My work so far is at https://regex101.com/r/4QWySg/1. As you can see, it's only capturing after the first hour is encountered.
Any help is appreciated :)
909
59:48.420 --> 1:00:06.450
THERE SHOULD BE AN OPTION TO UNMUTE DO
910
1:00:06.460 --> 1:00:09.870
YOU SEE A MICROPHONE ICON ANYWHERE ON YOUR TEAMS
(^\d+$\R)?(\d{1,2}(?::\d{2}){2}\.\d{2,3})\s*-->\s*(\d{1,2}(?::\d{2}){2}\.\d{2,3})\R((?:[^\r\n]|\r?\n[^\r\n])*)(?:\r?\n\r?\n|$)
You can simplify your regex and get all matches using this in PHP:
((?:\d{1,2}:)?\d{2}:\d{2}\.\d{2,3})\s*-->\s*((?1))\R(.+)
RegEx Details:
(
: Start capture group #1
(?:\d{1,2}:)?
: Match optional hour digits followed by :
\d{2}:\d{2}\.\d{2,3}
: Match mm:ss.uuu
part)
: End capture group #1\h*-->\h*
: Match ->
surrounded with optional spaces on both sides((?1))
: recurses the 1st subpattern i.e. match using same pattern as in group #1. Capture this in group #2\R
: Match any newline(.+)
: Match 1+ of any characters in 3rd capture group for caption