Search code examples
pythonworkflowluigi

Nested Luigi tasks do not appear in execution summary


I plan on using Luigi to write a reproducible and failure-resistant hyperparameter-tuning task. Therefore I call my class TrainOneModel multiple times within the "parent" class HParamOptimizer.

To simplify things here is an easier hello world version:

import luigi

# Child class
class HelloTask(luigi.Task):
    name = luigi.parameter.Parameter(default='Luigi')

    def run(self):
        print(f'Luigi says: Hello {self.name}!')

# Parent class
class ManyHellos(luigi.Task):

    def run(self):
        names = ['Marc', 'Anna', 'John']
        for name in names:
            hello = HelloTask(name=name)
            hello.run()

if __name__ == '__main__':
    luigi.run(['ManyHellos', '--workers', '1', '--local-scheduler'])

Running the script with python filename.py works and the progress looks :). Names are also printed as expected, however, the execution summary only shows that ManyHellos ran:

Scheduled 1 tasks of which:
* 1 ran successfully:
    - 1 ManyHellos()

Is there a possibility of including the child class HelloTask to view how things are progressing in the central scheduling visualizer?

Thanks, BBQuercus


Solution

  • They are not shown because you're executing their run() manually instead of going through the scheduler. The luigi way to do it would be more like this:

    import luigi
    
    
    # Child class
    class HelloTask(luigi.Task):
        name = luigi.parameter.Parameter(default='Luigi')
    
        def run(self):
            with self.output().open('w') as fout:
                fout.write(f'Luigi says: Hello {self.name}!')
    
        def output(self):
            # An output target is needed for the scheduler to verify whether
            # the task was run.
            return luigi.LocalTarget(f'./names/{self.name}.txt')
    
    
    # Parent class
    class ManyHellos(luigi.Task):
        def run(self):
            names = ['Marc', 'Anna', 'John']
            for name in names:
                yield HelloTask(name=name)  # dynamically schedules a HelloTask
    
    
    if __name__ == '__main__':
        luigi.run(['ManyHellos', '--workers', '1', '--local-scheduler'])
    

    which leads to

    ===== Luigi Execution Summary =====
    
    Scheduled 4 tasks of which:
    * 4 ran successfully:
        - 3 HelloTask(name=Anna,John,Marc)
        - 1 ManyHellos()
    
    This progress looks :) because there were no failed tasks or missing dependencies
    

    when executed. Note that you could also yield multiple tasks at once, with yield [HelloTask(name) for name in names]