I'm running experiments on a model, with a workflow like this:
I'm using Git and Scientific Reproducibility as a guide , where the results of an experiment are stored in a table along the hash of the commit. I would like to store the results in a directory instead, naming the directories as hashes.
Thinking about version control, I would like to isolate the code
and analysis
. For example, a change of the color in a plot in a IPython notebook in analysis
shouldn't change anything in code
The approach I'm thinking:
A directory structure like this:
model
- code
- simulation_results
- a83bc4
- 23e900
- etc
- analysis
and different Git repositories for code
and analysis
, leaving simulation_results
out of Git.
Any comments? A better solution? Thanks.
That seems sound, and your structure would be a good fit for using git submodules
, model
becoming a parent git repo.
That way, you will link together code
, and analysis
SHA1 within the model
repo.
That means you can create your directory within the private (ie not versioned) directory model/simulation_results
based on the SHA1 of model
repo (the "parent" repo): that SHA1 links the SHA1 of both project
and analysis
submodules, which means you can reproduce the experiment exactly (based on the exact content of both project
and analysis
).