I am using kedro to make some comparative analysis.
I am using the quarto python package providing a wrapper to the quarto cli through the render function. This function will take a qmd file as input and generate a html report from it while computing python chunks.
In a quarto report I have some chunks containing evaluation of output_var1
and output_var2
for example:
plot_function(output_var1)
plot_function(output_var2)
where output_var1 and output_var2 are pandas data frame for example (could be any type of data)
At the end of the pipeline, I would like to compute my report with quarto using the outcome of my pipeline, without saving it to the data catalog.
from quarto import render
def create_pipeline(**kwargs) -> Pipeline:
return pipeline([node(func=function1,
inputs='my_input',
outputs="output_var1"),
node(func=function2,
inputs='my_input',
outputs="output_var2"),
node(func=render,
inputs='params:my_quarto_report', # path to a quatro report *.qmd
outputs=None))])
In this example my_input
is described in the data catalog but not output_var1
nor output_var2
.
The above example fails, because I don't know how to pass output_var1 and output_var2 to quarto. How could this be done? Does quarto have a way to pass complex variables such as dataframe ? I have understand how to pass simple text or numerical variables but I don't see how to pass something which do not fit on the command line.
After some tinkering I managed to reach a decent solution: I cannot pass complex variables directly to quarto, but I can make the node generating the report dependent on some other kedro catalog items by giving them as kwargs to the node calling the quarto render function. Here is an example of a generate_reports
kedro pipeline generating a report dependent on output_var
which was generate in a different pipeline/node.
conf/base/catalog.yml:
output_var_catalog_entry:
type: pickle.PickleDataSet
filepath: data/07_model_output/output_var.pkl
conf/base/parameters.yml:
report_filename: notebooks/report.qmd
notebooks/report.qmd:
---
jupyter: python3
title: My title
---
Some explanations
```{python}
import kedro
conf_loader = kedro.config.ConfigLoader('conf')
conf_catalog = conf_loader.get("catalog.yml")
catalog = kedro.io.DataCatalog.from_config(conf_catalog)
output_var = catalog.load("output_var_catalog_entry")
some_plot(output_var)
```
src/project_name/pipelines/generate_reports/nodes.py
from quarto import render
def generate_report(report: str, **kwargs):
print("This report depends on:")
for kw in kwargs:
print(kw)
render(report)
src/project_name/pipelines/generate_reports/pipeline.py
def create_pipeline(**kwargs) -> Pipeline:
return pipeline([node(func=generate_report,
inputs={"report": 'params:report_filename',
"output_var": "output_var"},
outputs=None,
name='generate_report')])