Search code examples
htmlregexencodingampersand

Regex to replace ampersands, but not when they're in a URL


So I have this regex:

&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)

That matches all &'s in a block of text

However, if I have this string:

& & & & & <a href="http://localhost/MyFile.aspx?mything=2&this=4">My Text &</a>
---------------------------------------------------------^

... the marked & also get's targeted - and as I'm using it to replace the &'s with & the url then becomes invalid:

http://localhost/MyFile.aspx?mything=2&amp;this=4

D'oh! Does anyone know of a better way of encoding &'s that are not in a url.


Solution

  • No, the URL does not become invalid. The HTML code becomes:

    <a href="http://localhost/MyFile.aspx?mything=2&amp;this=4">
    

    This means that the code that was not correctly encoded now is correctly encoded, and the actual URL that the link contains is:

    http://localhost/MyFile.aspx?mything=2&this=4
    

    So, it's not a problem that the & character in the code gets encoded, on the contrary the code is now correct.