Search code examples
regexnotepad++

Remove all tags <a> with Notepad + +


I want to remove from the text of all links (<a href=""></a>), except for those who have tag attribute href="site.com" (for example).

<a href="site.com">text</a>
<a href="google.com">text</a>
<a href="yandex.com">text</a>

That is that the last two links left. Can you please tell the correct regular expression for it (in Notepad + +).


Solution

  • First, the .* should be lazy, because otherwise, you will be matching more than necessary.

    <a href=".*?">.*?</a>
    

    Next, you can make use of a negative lookahead to prevent matches from <a href="site.com">text</a> and you do it like this:

    <a href="(?!site.com">).*?">.*?</a>
    

    Result if you replace by nothing will be that only <a href="site.com">text</a> will be left.

    If you want to keep the text, wrap the text around parentheses and call it in the replace:

    <a href="(?!site.com">).*?">(.*?)</a>
    

    And replace with $1.

    Be sure to select "Regular expression". And if your links span multiples lines, check the checkboxbox ". matches newline" as well.

    enter image description here