Search code examples
regexnotepad++negative-lookbehindregex-lookarounds

RegExp: want to find all links that do not end in ".html"


I'm a relative novice to regular expressions (although I've used them many times successfully). I want to find all links in a document that do not end in ".html" The regular expression I came up with is:

href=\"([^"]*)(?<!html)\"

In Notepad++, my editor, href=\"([^"]*)\" finds all the links (both those that end in "html" and those that do not). Why doesn't negative lookbehind work?

I've also tried lookahead:

href=\"[^"]*(?!html\")

but that didn't work either.

Can anybody help?

Cheers, grovel


Solution

  • That regular expression would work fine, if you were using PERL or PCRE (e.g. preg_match in PHP). However, lookahead and lookbehind assertions are not supported by most, especially the more simple, regular expression engines, like one that is used by the Notepad++. Only the most basic syntax such as quantifiers, subpatterns and characters classes are supported by almost all regular expression engines.

    You can find the documentation for the notepad++ regular expression engine at: http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Regular_Expressions