Search code examples
regexhtml-parsing

Match href and 'a' value in link


How can I match the href and 'a' vlaue in a link ?

So extract 'www.google.com' & 'test' from below :

<A HREF="www.google.com/test.html" title="test">test</A>

Here is what I am trying : '<A HREF=(.+).html' but it is not matching ?


Solution

  • Regular expressions for HTML can be brittle to change, but a regex for this exact case would be;

    <A HREF="\(.*\)" .*>\(.*\)</A>