Search code examples
javascriptnode.jsazureazure-storageazure-blob-storage

How can I parse Azure Blob URI in nodejs/javascript?


I need to parse Azure Blob URI in nodejs and extract storage account name, container name and blob name.

I investigated both azure-sdk-for-node and azure-storage-node but I found no method for doing so.

In case Blob URI is invalid, I would also like to detect that, so probably regex (if possible) would be a good way to go.

Some examples of Blob URI:

  1. https://myaccount.blob.core.windows.net/mycontainer/myblob
  2. http://myaccount.blob.core.windows.net/myblob
  3. https://myaccount.blob.core.windows.net/$root/myblob

Solution

  • By following the specification from Azure, I came up with following function (gist) that uses regex to parse the blob uri and it also throws an error if blob uri is invalid.

    Storage account name and container name should be completely right/precise, only blob name I left somewhat loose since it is more complex to define.

    /**
     * Validates and parses given blob uri and returns storage account, 
     * container and blob names.
     * @param {string} blobUri - Valid Azure storage blob uri.
     * @returns {Object} With following properties:
     *   - {string} storageAccountName
     *   - {string} containerName
     *   - {string} blobName
     * @throws {Error} If blobUri is not valid blob uri.
     */
    const parseAzureBlobUri = (blobUri) => {
      const ERROR_MSG_GENERIC = 'Invalid blob uri.'
    
      const storageAccountRegex = new RegExp('[a-z0-9]{3,24}')
      const containerRegex = new RegExp('[a-z0-9](?!.*--)[a-z0-9-]{1,61}[a-z0-9]')
      const blobRegex = new RegExp('.{1,1024}')  // TODO: Consider making this one more precise.
      const blobUriRegex = new RegExp(
        `^http[s]?:\/\/(${ storageAccountRegex.source })\.blob.core.windows.net\/`
        + `(?:(\$root|(?:${ containerRegex.source }))\/)?(${ blobRegex.source })$`
      )
      const match = blobUriRegex.exec(blobUri)
      if (!match) throw Error(ERROR_MSG_GENERIC)
    
      return {
        storageAccountName: match[1],
        // If not specified, then it is implicitly root container with name $root.
        containerName: match[2] || '$root',
        blobName: match[3]
      }
    }