I've this exercise:
Having these links
1. http://example.com/cat1/subcat3/subcat4/tag/this%20is%20page/asdasda?start=130
2. http://example.com/cat1/subcat3/subcat4/tag/this%20is%20pageasdasd
3. example.it/news/tag/this%is%20n%page?adsadsadasd
4. http://example.com/tag/thispage/asdasdasd.-?asds=
5. http://example.com/tag/this%20is%20page/asdasd
6. /tag/this/asdasdasd
7. /tag/asd-asd/feed/this-feed
8. /tag/sd-asd
http://example.com/tag/this%20is%20page
http://example.com/tag/this%20is%20pageasdasd
example.it/tag/this%is%20n%page
http://example.com/tag/thispage
http://example.com/tag/this%20is%20page
/tag/this
/tag/asd-asd
But eighth must be not consider by regex. The same is for domain name.
I tried to make it: https://regex101.com/r/aB5mPn/5 but i'm not able to not consider the last case.
Anyone can help me?
If I am not mistaken, you could add a negative lookahead before matching /tag...etc to assert that what follows for the eight case is not /tag/sd-asd until the end of the string (?!\/tag\/[^\/]+$)
Your regex could look like:
(?:(?:\/[A-Za-z0-9-]+)?)+(?!\/tag\/[^\/]+$)(\/tag\/[A-Za-z0-9-%]+)(.*)