Search code examples
pythonmlflowkedromlops

Logging the git_sha as a parameter on Mlflow using Kedro hooks


I would like to log the git_sha parameter on Mlflow as shown in the documentation. What appears to me here, is that simply running the following portion of code should be enough to get git_sha logged in the Mlflow UI. Am I right ?

@hook_impl
    def before_pipeline_run(self, run_params: Dict[str, Any]) -> None:
        """Hook implementation to start an MLflow run
        with the same run_id as the Kedro pipeline run.
        """
        mlflow.start_run(run_name=run_params["run_id"])
        mlflow.log_params(run_params)

But this does not work as I get all but the git_sha parameter. And when I look at the hooks specs, it seems that this param is not part of run_params (anymore?)

Is there a way I could get the git sha (maybe from the context journal ?) and add it to the logged parameters ?

Thank you in advance !


Solution

  • Whilst it's heavily encouraged to use git with Kedro it's not required and as such no part of Kedro (except kedro-starters if we're being pedantic) is 'aware' of git.

    In your before_pipeline_hook there it is pretty easy for you to retrieve the info via the techniques documented here. It seems trivial for the whole codebase, a bit more involved if you want to say provide pipeline specific hashes.