Search code examples
google-cloud-dataprep

Reuse the same recipe for multiple datasets


I want to use the same recipe that I use for one dataset for rest of my datasets.The structure/headers of all the datasets is same. Is there a way to import or reuse the same recipe without doing all the steps again?


Solution

  • I'm just getting started with DataPrep, but in my understanding you could feed all your sources into the recipe at the start, then fork them back out at the end and use a schedule to run each one.

    Say you have five input files with identical structure but representing different sales markets. Import all five, and if there's no market column then use a recipe to derive a new column with a static value.

    UNION all of these into the recipe (so the core recipe receives one file).

    At the end of the recipe, add a new recipe for each output which runs KEEP, keeping only the data for that market. This will generate five outputs.

    Schedule each of these recipes, and when the schedule runs you will get five different outputs - one for each input.