I plan on using Luigi to write a reproducible and failure-resistant hyperparameter-tuning task. Therefore I call my class TrainOneModel
multiple times within the "parent" class HParamOptimizer
.
To simplify things here is an easier hello world version:
import luigi
# Child class
class HelloTask(luigi.Task):
name = luigi.parameter.Parameter(default='Luigi')
def run(self):
print(f'Luigi says: Hello {self.name}!')
# Parent class
class ManyHellos(luigi.Task):
def run(self):
names = ['Marc', 'Anna', 'John']
for name in names:
hello = HelloTask(name=name)
hello.run()
if __name__ == '__main__':
luigi.run(['ManyHellos', '--workers', '1', '--local-scheduler'])
Running the script with python filename.py
works and the progress looks :). Names are also printed as expected, however, the execution summary only shows that ManyHellos
ran:
Scheduled 1 tasks of which:
* 1 ran successfully:
- 1 ManyHellos()
Is there a possibility of including the child class HelloTask
to view how things are progressing in the central scheduling visualizer?
Thanks, BBQuercus
They are not shown because you're executing their run() manually instead of going through the scheduler. The luigi way to do it would be more like this:
import luigi
# Child class
class HelloTask(luigi.Task):
name = luigi.parameter.Parameter(default='Luigi')
def run(self):
with self.output().open('w') as fout:
fout.write(f'Luigi says: Hello {self.name}!')
def output(self):
# An output target is needed for the scheduler to verify whether
# the task was run.
return luigi.LocalTarget(f'./names/{self.name}.txt')
# Parent class
class ManyHellos(luigi.Task):
def run(self):
names = ['Marc', 'Anna', 'John']
for name in names:
yield HelloTask(name=name) # dynamically schedules a HelloTask
if __name__ == '__main__':
luigi.run(['ManyHellos', '--workers', '1', '--local-scheduler'])
which leads to
===== Luigi Execution Summary =====
Scheduled 4 tasks of which:
* 4 ran successfully:
- 3 HelloTask(name=Anna,John,Marc)
- 1 ManyHellos()
This progress looks :) because there were no failed tasks or missing dependencies
when executed. Note that you could also yield multiple tasks at once, with yield [HelloTask(name) for name in names]