Search code examples
python-3.xairflowairflow-scheduler

Race condition in Airflow dynamic DAG


I have an Airflow DAG that looks like this (Airflow 1.10.15):

Airflow DAG

Cubes type:

  • lvl0_parser: Kuberenetes Operator
  • get_rawdata_tables: Python Operator
  • end_of_data_collectors: Dummy Operator
  • all the rest (blue cubes) are also Python Operators.

I'm facing an issue that happens in rare use cases (which I still couldn't figure out), which is that the "end_of_data_collectors" cubes (DummyOperator) starts before the previous cubes finished. An important detail is that cube "get_rawdata_tables" creates a JSON file that describes the next cubes that should be opened (the cubes between "get_rawdata_tables" and "end_of_data_collectors") and then we open them on runtime - the DAG is not static (which I know is not officially supported or recommended, but it works - most of the time). All trigger rules are set to the default "on success"

I suspect that the problem is related to long parsing time of the DAG in case of a lot of dynamic cubes, but I'm not sure.

My questions are:

  1. In case I configure Airflow scheduler to run every minute, is it also running when each cube finishes to validate the next dependency? or just every minute regardless of what's happening in the DAG.
  2. I've been working like this for 3.5 years and it happend to me on few use cases only after changing "end_of_data_collectors" to be DummyOperator (it was Kubernetes Operator before) - Do you think it could be a reason? something in this Operator behaves differently so it can explain this issue?
  3. Do you think my theory about race condition between the scheduler and the DAG parsing make sense?

Thanks


Solution

  • I could figure it out, looks like Airflow scheduler ignores the Dummy Operator cube ("end_of_data_collectors"), skips it and marks it as "finish successfully" and then continues.

    I also found an evidence to this behavior in Airflow code:

    enter image description here

    Looks like it wasn't a good idea to use Dummy Operator in dynamic use cases.