I'm using DVC to track and version data that is stored locally on the file system and in Azure Blob storage.
My setup is as follows:
DataProject1
, it uses a local file location as a remote therefore it does not require any authentication.
DataProject2
, it uses Azure Blob Storage as a remote, it is using sas_token for authentication, I can push pull data to/from the remote when I'm within this project.
MLProject
, it uses dvc import to import data from DataProjec1
and DataProject2
.
When I run the import with the command against DataProject1
everything works fine:
dvc import -o 'data/project1' 'https://company.visualstudio.com/DefaultCollection/proj/_git/DataProject1' 'data/project1'
- Successful
Howevever when I run a similar command against DataProject2
the command fails:
dvc import -o 'data/project2' 'https://company.visualstudio.com/DefaultCollection/proj/_git/DataProject2' 'data/project2'
- it fails with:
ERROR: unexpected error - Operation returned an invalid status 'This request is not authorized to perform this operation using this permission.' ErrorCode:AuthorizationPermissionMismatch.
I would like to configure the dvc import
so that I can set the required sas_token
but I cannot find a way to do that.
This happens since DVC is not using MLProject
's config when it clones and does dvc fetch
in the DataProject2
during the import
. And it doesn't know where it can find the token (clearly, it's not in the Git repo, right?).
There are a few ways to specify it: global/system
configs and/or environment variables.
To implement the first option:
On a machine where you do dvc import
, you could create a remote in the --global
, or --system
configs with the same name and specify the token there. Global config fields will be merged with the config in the DataProject2
repo when DVC is pulling data to import.
dvc remote add --global <DataProject2-remote-name> azure://DataProject2/storage
dvc remote modify --global <DataProject2-remote-name> account_name <name>
dvc remote modify --global <DataProject2-remote-name> sas_token <token>
The second option:
export AZURE_STORAGE_SAS_TOKEN='mysecret'
export AZURE_STORAGE_ACCOUNT='myaccount'
Please give it a try, let me know if that works or not.