Search code examples
tensorflowmachine-learningorchestrationtfx

What exactly is Orchestrators in ML?


Actually, in ML pipeline components we are specifying inputs and outputs clearly .

For example in TFX statisticgen take input from examplegen and outputs some statistics.so input and output is clear which is same in all components .so why we need orchestrators .if anyone knows please help me?


Solution

  • In real-life projects, everything can be much more complicated:

    • the input data can be from the different sources: database, file system, third-party services. So we need to do classical ETL before we can start working with data.
    • you can use different technologies in the one pipeline. For instance, Spark as a preprocessing tool, after you can need to use an instance with GPU for the model training.

    • last, but not least - in production you need to care much more things. For instance data validation, model evaluation, etc. I wrote a separate article about how to organize this part using Apache Airflow.