Given a HLS media playlist as follows:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-ALLOW-CACHE:NO
#EXT-X-TARGETDURATION:7
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PROGRAM-DATE-TIME:2022-09-12T10:03:22.621+02:00
#EXTINF:6.666666667,
seg1.ts
#EXT-X-PROGRAM-DATE-TIME:2022-09-12T10:03:29.637+02:00
#EXTINF:6.666666667,
seg2.ts
#EXT-X-PROGRAM-DATE-TIME:2022-09-12T10:03:36.583+02:00
#EXTINF:6.666666666,
seg3.ts
I want to create a regular expression to match the datetime following the EXT-X-PROGRAM-DATE-TIME
tag closest to a specified .ts file name. For example, I want to be able to retrieve the datetime 2022-09-12T10:03:29.637+02:00
, by specifying that the match should end with seg2.ts
. It should work even if new tags are added in between the file name and the EXT-X-PROGRAM-DATE-TIME
tag in the future.
This pattern (EXT-X-PROGRAM-DATE-TIME:(.*)[\s\S]*?seg2.ts
) is my best effort so far, but I can't figure out how make the match start at the last possible EXT-X-PROGRAM-DATE-TIME
tag. The lazy quantifier did not help. The group that is currently captured is the datetime following the first EXT-X-PROGRAM-DATE-TIME
, i.e. 2022-09-12T10:03:22.621+02:00
.
I also looked at using negative lookahead, but I can't figure out how to combine that with matching a variable number of characters and whitespaces before the seg2.ts
.
I'm sure this has been answered before in another context, but I just can't find the right search terms.
We can use re.search
here along with a regex tempered dot trick:
#Python 2.7.17
import re
inp = """#EXTM3U
#EXT-X-VERSION:3
#EXT-X-ALLOW-CACHE:NO
#EXT-X-TARGETDURATION:7
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PROGRAM-DATE-TIME:2022-09-12T10:03:22.621+02:00
#EXTINF:6.666666667,
seg1.ts
#EXT-X-PROGRAM-DATE-TIME:2022-09-12T10:03:29.637+02:00
#EXTINF:6.666666667,
seg2.ts
#EXT-X-PROGRAM-DATE-TIME:2022-09-12T10:03:36.583+02:00
#EXTINF:6.666666666,
seg3.ts"""
match = re.search(r'#EXT-X-PROGRAM-DATE-TIME:(\S+)(?:(?!EXT-X-PROGRAM-DATE-TIME).)*\bseg2\.ts', inp, flags=re.S)
if match:
print(match.group(1)) # 2022-09-12T10:03:29.637+02:00
Here is an explanation of the regex pattern:
#EXT-X-PROGRAM-DATE-TIME:
(\S+)
match and capture the timestamp(?:(?!EXT-X-PROGRAM-DATE-TIME).)*
match all content WITHOUT crossing the next section\bseg2\.ts
match the filename
if match: