Search code examples
regexpowershellsubstringtext-parsing

How to extract a substring from an EDL line, between 2 sequences of characters


With Powershell, I want to extract from a video markers EDL file (Edit Decision List), the content related the Marker name. Here an example of an EDL line

 |C:ResolveColorBlue |M:The Importance of Planning and Preparedness |D:1

I want all what's contained after |M: and before |D: and assign it to a variable.

I applied Regex

$MarkerName = [regex]::Match($line, '[^|M:]+(?= |D:)').Value

In my mind it should extract all what's included between |M: and |D:

I saw an example here https://collectingwisdom.com/powershell-substring-after-character/

No it doesn't. It extracts ResolveColorBlue and nothing else.

Io also tried to apply what's int here

powershell extract text between two strings

But it deosn't work. It's referred to a file, while I have already elaborated all the file content to get my string I need to filter

Where am I wrong please?


Solution

  • Your pattern, [^|M:]+(?= |D:), matches like this:

    • [^|M:]+ - one or more occurrences (+) of any characters but | and M ([^|M:], a negated character class)
    • (?= |D:) - that is immediately followed with either a space or D:.

    As you see here (mind the selected .NET regex engine on the left!), the match is really ResolveColorBlue as the matching can start after the first : as there is no : and | until the first space, and then it matches till the first whitespace since right after it there is a | char and it cannot be matched with [^|M]. You can see for yourself how the regex engine processes the string at regex101.com:

    enter image description here

    Use

    (?<=\|M:).*?(?=\|D:)
    

    Or, to trim any whitespaces from the match with the regex itself:

    (?<=\|M:\s*).*?(?=\s*\|D:)
    

    This regex (see its demo) extracts strings between |M and |D:.

    The pipe must be escaped to match a literal | char.

    More details:

    • (?<=\|M:\s*) - a positive lookbehind that matches a location that is immediately preceded with |M: and any zero or more whitespaces
    • .*? - any zero or more chars other than newline as few as possible
    • (?=\s*\|D:) - a positive lookahead that matches a location that is immediately followed with any zero or more whitespaces and then |D:.