Regex groups for cloudinary url

I'm trying to capture different parts of a url while ignoring parts that sometimes comes up.

I've tried using and extending the regex found here with little luck. https://gist.github.com/ahmadawais/9813c44b7e51c2c3540d2165d6c6cc65

Take the example

https://res.cloudinary.com/test-site/image/upload/v1619174590/folder/path/cjtdn73cleqagpy4fqza.jpg

https://res.cloudinary.com/test-site/image/upload/ar_1:1,c_fill,f_auto,g_auto,w_700/v1619174590/folder/path/cjtdn73cleqagpy4fqza.jpg

https://res.cloudinary.com/test-site/image/facebook/fb_id

res.cloudinary.com : host

test-site : cloudname

upload/facebook: resource_type

v1619174590/rg/collective/media/cjtdn73cleqagpy4fqza.jpg: id

I need to ignore everything between /upload/ and /v, I've accomplished this using //upload/.*?\b(?=v1)/ , but it doesn't account for if the resource type is facebook and there is no /v123

Solution

You can use

https?:\/\/(?<host>[^\/]+)\/(?<cloudname>[^\/]+)\/[^\/]+\/(?<resource_type>[^\/]+)(?:\/[^\/,]*,[^\/]*)?\/(?<id>.*)
https?:\/\/([^\/]+)\/([^\/]+)\/[^\/]+\/([^\/]+)(?:\/[^\/,]*,[^\/]*)?\/(.*)

The first regex is compliant with the ECMAScript 2018+ standard that supports named capturing groups, and the second one just contains regular, numbered capturing groups.

See the regex demo.

Details

https?:\/\/ - https:// or http://
([^\/]+) - Group 1 (host): one or more chars other than / - \/ - a / char
([^\/]+) - Group 2 (cloud name): one or more chars other than /
\/[^\/]+\/ - /, any one or more chars other than / and a /
([^\/]+) - Group 3 (resource type): one or more chars other than /
(?:\/[^\/,]*,[^\/]*)? - an optional sequence of
- \/ - a / char
- [^\/,]* - zero or more chars other than / and ,
- , - a comma
- [^\/]* - zero or more chars other than /
\/ - a / char
(.*) - Group 4 (id): the rest of the string.