Search code examples
directed-acyclic-graphsdata-pipelinedvc

Is dvc.yaml supposed to be written or generated by dvc run command?


Trying to understand dvc, most tutorials mention generation of dvc.yaml by running dvc run command.

But at the same time, dvc.yaml which defines the DAG is also well documented. Also the fact that it is a yaml format and human readable/writable would point to the fact that it is meant to be a DSL for specifying your data pipeline.

Can somebody clarify which is the better practice? Writing the dvc.yaml or let it be generated by dvc run command? Or is it left to user's choice and there is no technical difference?


Solution

  • I'd recommend manual editing as the main route! (I believe that's officially recommended since DVC 2.0)

    dvc stage add can still be very helpful for programmatic generation of pipelines files, but it doesn't support all the features of dvc.yaml, for example setting vars values or defining foreach stages.