Search code examples
dvc

How to use different remotes for different folders?


I want my data and models stored in separate Google Cloud buckets. The idea is that I want to be able to share the data with others without sharing the models.

One idea I can think of is using separate git submodules for data and models. But that feels cumbersome and imposes some additional requirements from the end user (e.g. having to do git submodule update).

So can I do this without using git submodules?


Solution

  • You can first add the different DVC remotes you want to establish (let's say you call them data and models, each one pointing to a different GC bucket). But don't set any remote as the project's default; This way, dvc push won't work without the -r (or --remote) option.

    You would then need to push each directory or file individually to the appropriate remote, like dvc push data/ -r data and dvc push model.dat -r models.

    Note that a feature request to configure this exists on the DVC repo too. See Specify file types that can be pushed to remote.