Search code examples
azuregoogle-cloud-storageazure-storageazcopy

Azcopy interprets source as local and adds current path when it is a gcloud storage https url


We want to copy files from Google Storage to Azure Storage. We used following this guide: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-google-cloud

We run this command:

azcopy copy 'https://storage.googleapis.com/telia-ddi-delivery-plaace/activity_daily_al1_20min/' 'https://plaacedatalakegen2.blob.core.windows.net/teliamovement?<SASKEY>' --recursive=true

And get this resulting error:

INFO: Scanning...
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
failed to perform copy command due to error: cannot start job due to error: cannot scan the path /Users/peder/Downloads/https:/storage.googleapis.com/telia-ddi-delivery-plaace/activity_daily_al1_20min, please verify that it is a valid.

It seems to us that azcopy interprets the source as a local file destination and therefore adds the current location we run it from which is: /Users/peder/Downloads/. But we are unable to find any arguments to indicate that it is a web location and it is identical to the documentation in this guide:

azcopy copy 'https://storage.cloud.google.com/mybucket/mydirectory' 'https://mystorageaccount.blob.core.windows.net/mycontainer/mydirectory' --recursive=true

What we have tried:

  • We are doing this on a Mac in Terminal, but we also tested PowerShell for Mac.
  • We have tried single and double quotes.
  • We copied the Azure Storage url with SAS key from the console to ensure that has correct syntax
  • We tried cp instead of copy as the help page for azcopy used that.

Is there anything wrong with our command? Or can it be that azcopy has been changed since the guide was written?

I also created an issue for this on the Azure Documentation git page: https://github.com/MicrosoftDocs/azure-docs/issues/78890


Solution

  • The reason you're running into this issue is because the URL storage.cloud.google.com is hardcoded in the application source code for Google Cloud Storage. From this link:

    const gcpHostPattern = "^storage.cloud.google.com"
    const invalidGCPURLErrorMessage = "Invalid GCP URL"
    const gcpEssentialHostPart = "google.com"
    

    Since you're using storage.googleapis.com instead of storage.cloud.google.com, it is not recognized by azcopy as a valid Google Cloud Storage endpoint and it considers the value as one of the directories in your local file system.