Search code examples
javascriptregexcloudinary

Regex groups for cloudinary url


I'm trying to capture different parts of a url while ignoring parts that sometimes comes up.

I've tried using and extending the regex found here with little luck. https://gist.github.com/ahmadawais/9813c44b7e51c2c3540d2165d6c6cc65

Take the example

https://res.cloudinary.com/test-site/image/upload/v1619174590/folder/path/cjtdn73cleqagpy4fqza.jpg

https://res.cloudinary.com/test-site/image/upload/ar_1:1,c_fill,f_auto,g_auto,w_700/v1619174590/folder/path/cjtdn73cleqagpy4fqza.jpg

https://res.cloudinary.com/test-site/image/facebook/fb_id

res.cloudinary.com : host

test-site : cloudname

upload/facebook: resource_type

v1619174590/rg/collective/media/cjtdn73cleqagpy4fqza.jpg: id

I need to ignore everything between /upload/ and /v, I've accomplished this using //upload/.*?\b(?=v1)/ , but it doesn't account for if the resource type is facebook and there is no /v123


Solution

  • You can use

    https?:\/\/(?<host>[^\/]+)\/(?<cloudname>[^\/]+)\/[^\/]+\/(?<resource_type>[^\/]+)(?:\/[^\/,]*,[^\/]*)?\/(?<id>.*)
    https?:\/\/([^\/]+)\/([^\/]+)\/[^\/]+\/([^\/]+)(?:\/[^\/,]*,[^\/]*)?\/(.*)
    

    The first regex is compliant with the ECMAScript 2018+ standard that supports named capturing groups, and the second one just contains regular, numbered capturing groups.

    See the regex demo.

    Details

    • https?:\/\/ - https:// or http://
    • ([^\/]+) - Group 1 (host): one or more chars other than / - \/ - a / char
    • ([^\/]+) - Group 2 (cloud name): one or more chars other than /
    • \/[^\/]+\/ - /, any one or more chars other than / and a /
    • ([^\/]+) - Group 3 (resource type): one or more chars other than /
    • (?:\/[^\/,]*,[^\/]*)? - an optional sequence of
      • \/ - a / char
      • [^\/,]* - zero or more chars other than / and ,
      • , - a comma
      • [^\/]* - zero or more chars other than /
    • \/ - a / char
    • (.*) - Group 4 (id): the rest of the string.