Search code examples
regexregex-negationregex-lookaroundsregex-groupregular-language

Remove segments and replaces in url with regex


I've this exercise:

Having these links

1. http://example.com/cat1/subcat3/subcat4/tag/this%20is%20page/asdasda?start=130
2. http://example.com/cat1/subcat3/subcat4/tag/this%20is%20pageasdasd
3. example.it/news/tag/this%is%20n%page?adsadsadasd
4. http://example.com/tag/thispage/asdasdasd.-?asds=
5. http://example.com/tag/this%20is%20page/asdasd
6. /tag/this/asdasdasd
7. /tag/asd-asd/feed/this-feed
8. /tag/sd-asd
  • In first case the result must be: http://example.com/tag/this%20is%20page
  • In second case the result must be: http://example.com/tag/this%20is%20pageasdasd
  • In third case the result must be: example.it/tag/this%is%20n%page
  • In forth case the result must be: http://example.com/tag/thispage
  • In fifth case the result must be: http://example.com/tag/this%20is%20page
  • In sixth case the result must be: /tag/this
  • In seventh case the result must be: /tag/asd-asd

But eighth must be not consider by regex. The same is for domain name.

I tried to make it: https://regex101.com/r/aB5mPn/5 but i'm not able to not consider the last case.

Anyone can help me?


Solution

  • If I am not mistaken, you could add a negative lookahead before matching /tag...etc to assert that what follows for the eight case is not /tag/sd-asd until the end of the string (?!\/tag\/[^\/]+$)

    Your regex could look like:

    (?:(?:\/[A-Za-z0-9-]+)?)+(?!\/tag\/[^\/]+$)(\/tag\/[A-Za-z0-9-%]+)(.*)