I'm trying to capture different parts of a url while ignoring parts that sometimes comes up.
I've tried using and extending the regex found here with little luck. https://gist.github.com/ahmadawais/9813c44b7e51c2c3540d2165d6c6cc65
Take the example
https://res.cloudinary.com/test-site/image/upload/v1619174590/folder/path/cjtdn73cleqagpy4fqza.jpg
https://res.cloudinary.com/test-site/image/upload/ar_1:1,c_fill,f_auto,g_auto,w_700/v1619174590/folder/path/cjtdn73cleqagpy4fqza.jpg
https://res.cloudinary.com/test-site/image/facebook/fb_id
res.cloudinary.com : host
test-site : cloudname
upload/facebook: resource_type
v1619174590/rg/collective/media/cjtdn73cleqagpy4fqza.jpg: id
I need to ignore everything between /upload/ and /v, I've accomplished this using //upload/.*?\b(?=v1)/ , but it doesn't account for if the resource type is facebook and there is no /v123
You can use
https?:\/\/(?<host>[^\/]+)\/(?<cloudname>[^\/]+)\/[^\/]+\/(?<resource_type>[^\/]+)(?:\/[^\/,]*,[^\/]*)?\/(?<id>.*)
https?:\/\/([^\/]+)\/([^\/]+)\/[^\/]+\/([^\/]+)(?:\/[^\/,]*,[^\/]*)?\/(.*)
The first regex is compliant with the ECMAScript 2018+ standard that supports named capturing groups, and the second one just contains regular, numbered capturing groups.
See the regex demo.
Details
https?:\/\/
- https://
or http://
([^\/]+)
- Group 1 (host): one or more chars other than /
- \/
- a /
char([^\/]+)
- Group 2 (cloud name): one or more chars other than /
\/[^\/]+\/
- /
, any one or more chars other than /
and a /
([^\/]+)
- Group 3 (resource type): one or more chars other than /
(?:\/[^\/,]*,[^\/]*)?
- an optional sequence of
\/
- a /
char[^\/,]*
- zero or more chars other than /
and ,
,
- a comma[^\/]*
- zero or more chars other than /
\/
- a /
char(.*)
- Group 4 (id): the rest of the string.