Search code examples
regexlookbehind

RegEx: Matching Pattern within Pattern - I think I need to use Positive Lookbehinds?


I'm trying to use RegEx to find a pattern within a pattern. Specifically what I want to do is capture a URL into a reference and search within that for everything that comes after the last = sign and capture that as well.

So given this string

<a href="http://my.domain.com/?s_cid=EM&s_ev9=CMC21892&s_ev10=EM_CMC21892_LC_stuff" style="color: #365EBF:">stuff</a>

I would initially find

href="http://my.domain.com/?s_cid=EM&s_ev9=CMC21892&s_ev10=EM_CMC21892_LC_stuff"

Using this RegEx: href="(https?[^"]*)"

From there I could parse the actual string (when looking at the captured group) I'm looking for EM_CMC21892_LC_stuff with this: =[^"=]*$

I am having no success though when I try to combine the two to accomplish it in one RegEx.

Any thoughts?


Solution

  • He's right, using regexes to parse HTML is just asking for trouble.

    That said, try href="http[^"]+=([^"]+?)" .