Search code examples
regexpcre

Regex to select all URLs in a body of text except a particular URL (Sublime Text)


I have the following example of copy that I wish to do a find and replace in Sublime using Regex. However I cannot figure out how to select all the URLs except for a particular one. I know I can do it quite easily if I knew what the url was however the only URL I know of is the one I don't wish to replace with anchor tags.

Copy Example:

this is example.com.au and this is exampleflowers.com.au and of course another anotherexample.com.au/terms.html, url. Oh no exampleflowers.com.au is in this sentence again.

Ulimately I want any URL to be surrounded by a href tag except a URL which contains flowers.com.au in it!

My current simple Regex I use to test for URL is:

    /\w+(\.[^\s,\.^#]+)+/gi

I have also tried

    /\w+(?!flowers)(\.[^\s,\.^#]+)+/gi

Any assistance is deeply appreciated.


Solution

  • Your regex will match 1+ word characters \w+ and then repeats 1+ times a capturing group (\.[^\s,\.^#]+)+ which itself will match a dot and 1+ times what is in the character class.

    The negative lookahead (?!flowers) will check at the end of matching 1+ word characters if flowers is not on the right which will be true because it has already matched all the word characters including flowers.

    You might use your regex in combination with a negative lookahead to check if what is on the right does not contain flowers.com.au

    Find

    (?<!\S)(?!\S*flowers\.com\.au)(\w+(?:\.[^\s,.#]+)+)

    Replace

    <a href="$1">$1</a>

    Explanation

    • (?<!\S) negative lookbehind to assert what is on the left is not a non whitespace character
    • (?!\S*flowers\.com\.au) Negative lookahead to assert what is on the right is not 0+ times a non whitespace character followed by flowers.com.au
    • (\w+(?:\.[^\s,.#]+)+) Use your regex in a capturing group and use that in the replacement

    Regex demo

    Note that your negated character class [^\s,\.^#] could be written as [^\s,.#]+