I have the following string:
<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>
And I want to extract:
What I have so far is:
(?<=<A href="CarPage\.asp\?parent=)[A-Za-z0-9]*(\+\+\+&Color=)[A-Za-z0-9]{3}(\">)[A-Za-z0-9\- ]*(?=</a>)
But I'm not sure how to set up positive and negative lookahead and lookbehinds when they are not on the string boundaries.
I know, it's HTML...I've heard it before... "Don't parse html with regex..." I don't need anything more elaborate than this.
Help is appreciated.
Thanks!
Better use a parser, but if your link is always formatted in the exact same way (no ids, classes, extra params, params in a different order, etc, try:
parent=(\w+?)\+*&Color=(\w+?)">(.*?)<
The different with Mu's suggestion is the greediness.