Search code examples
regexregex-negationregex-groupedgecast

regex to match all subfolders of a URL, except a few special ones


OK, I'm writing a regex that I want to match on a certain url path, and all subfolders underneath it, but with a few excluded. for context, this is for use inside verizon edgecast, which is a CDN caching system. it supports regex, but unfortunately i don't know the 'flavor' of regex it supports and the documentation isn't clear about that either. Seems to support all the core regex features though, and that should be all i need. unfortunately reading the documentation requires an account, but you can get the general idea of edgecast here: https://www.verizondigitalmedia.com/platform/edgecast-cdn/

so, here is some sample data:

help
help/good
help/better
help/great
help/bad
help/bad/worse

and here is the regex I am using right now:

(^help$|help\/[^bad].*)

link: https://regex101.com/r/CBWUDE/1

broken down:

( - start capture group
^ - start of string
help - 1st thing that should match
$ - end of string
| - or
help - another thing that should match
\/ - escaped / so i can match help/
[^bad] - match any single character that isn't b, a, or d
. - any character 
* - any number of times
) - end capture group

I would like the first 4 to match, but not the last 2, 'bad' or 'bad/worse' should not be matches, and help/anythingelse should be a match

this regex is working for me, except that help/better is not a match. the reason it's not a match, i'm pretty sure, is because better, contains a character that appears inside 'bad'. if i change 'bettter' to 'getter' then it becomes a match, because it no longer has a b in it.

so what i really want is my 'bad' to only match the whole word bad, and not match any thing with b, a, or d in it. I tried using word boundary to do this, but isn't giving me the results i need, but perhaps i just have the syntax wrong, this is what i tried:

(^help$|help\/[^\bbad\b].*)

but does not seem to work, the 'bad' urls are no longer excluded, and help/better is still not matching with that. I think it's because / is not a word boundary. I'm positive my problem with the original regex is with the part:

[^bad] - match any single character that isn't b, a, or d

my question is, how can i turn [^bad] into something that matches anything that doesn't contain the full string 'bad'?


Solution

  • You're going to want to use negative look ahead (?!bad) instead of negating specific letters [^bad]

    I think (^help$|help\/(?!bad).*) is what you're looking for

    Edit: if you mean anything with the word bad at all, not just help/bad you can make it (?!.*bad.*) This would prevent you from matching help/matbadtom for example. Full regex: (^help$|help\/(?!.*bad.*).*)