I am trying to set up my Netscaler device with a Rewrite Policy. One of my requirements is to replace any non-domain URLs with the home page URL... that is, I want the Netscaler to replace all external links on a page being served from behind the device with the home page's URL (ex: https://my.domain.edu). The type of Rewrite Policy I'm trying to configure uses a PCRE-compliant regex engine to find specific text on a web page (multiple matches possible).
good links:
https://your.page.domain.edu -- won't be replaced
http://good.domain.edu -- also won't be replaced
bad links (should be replaced with home page URL):
https://www.google.com
http://not.the.best.example.org
http://another.bad.example.erewhon.edu
https://my.domain.com
I currently have this pattern:
(https?://)(?![\w.-]+\.domain\.edu)
According to the Netscaler's RegEx evaluation tool this matches the bad links above and doesn't match the good links, so it seems to be working... in fact, when I run this on a test page, the Netscaler finds all the URLs I want to replace and leaves the good URLs alone.
The problem is the Netscaler isn't replacing the URLs the way I want: it replaces the (https?://) group with the home page URL but leaves the remaining part of the bad URL. For example, it replaces http://www.google.com with: https://my.domain.eduwww.google.com
I can configure the Rewrite Policy to replace specific URLs (for example, https://www.google.com), so I know the mechanism works. Obviously, this won't work for the general case.
I've tried enclosing the entire regex in parentheses, but this didn't change anything.
Can a regular expression be written for the general case, to match the entire URL for all domains that aren't mine?
Thanks in advance for any help!
You can use the following regex:
^https?:\/\/[\w.-]+(?<!\.domain\.edu)$
with your home page URL as substitution:
https://my.domain.edu
TEST INPUT:
https://www.google.com
http://not.the.best.example.org
http://another.bad.example.erewhon.edu
https://my.domain.com
https://your.page.domain.edu
http://good.domain.edu
TEST OUTPUT:
https://my.domain.edu
https://my.domain.edu
https://my.domain.edu
https://my.domain.edu
https://your.page.domain.edu
http://good.domain.edu
If http/https
matters than use the following regex:
^(https?:\/\/)[\w.-]+(?<!\.domain\.edu)$
with replacement:
\1my.domain.edu
INPUT:
https://www.google.com
http://not.the.best.example.org
http://another.bad.example.erewhon.edu
https://my.domain.com
https://your.page.domain.edu
http://good.domain.edu
OUTPUT:
https://my.domain.edu
http://my.domain.edu
http://my.domain.edu
https://my.domain.edu
https://your.page.domain.edu
http://good.domain.edu