Search code examples
luigi

Luigi fails to finish all tasks listed in the require method


Say I have a task with the following dependency structure

class ParentTask(luigi.Task):
    def requires(self):
        return [ChildTask(classLevel=x) for x in self.class_level_list]
    def run(self):
        yadayda

The child task runs fine on it own. The parent correctly checks all the children tasks for finish status. Yet when the first child task finishes, the scheduler mark the parent task as finished. with the following message:

   Scheduled 15 tasks of which:
* 3 ran successfully:
    - 1 CleanRecord(...)
    - 1 EstimateQuestionParameter(classLevel=6, qdt=2016-04-19, subject=english)
    - 1 GetLog(classLevel=6, qdt=2016-04-19, subject=english)
* 12 were left pending, among these:
    * 12 were left pending because of unknown reason:
        - 5 EstimateQuestionParameter(classLevel=1...5, qdt=2016-04-19, subject=english)
        - 5 GetLog(pool=None, classLevel=1...5, qdt=2016-04-19, subject=english)
        - 1 UpdateQuestionParameter(qdt=2016-04-19, lastQdt=2016-03-23, subject=english, isInit=False)
        - 1 UpdateQuestionParameterBuffer(qdt=2016-04-19, subject=english, src_table=edw.edw_behavior_question_record_exam_new)

This progress looks :) because there were no failed tasks or missing external dependencies

Solution

  • I think this happens because your worker gets disconnected from the scheduler. The worker's heartbeats don't reach scheduler because of network partition or, more likely, because they're never sent due to this issue.

    You have two options to work-around the problem:

    • Increase worker-disconnect-delay setting ([scheduler] section in config, default 60s)
    • Use more than one worker for your job, e.g. --workers 2 (if it's the latter reason)