If one of the tasks in the Luigi graph need to run on a remote Hadoop cluster, is that possible? The machine on which Luigi runs is different from the Hadoop cluster. Can luigi still check the if the HDFS file in the remote cluster exists?
I tried to find documentation for this but wasn't able to.
You can run a job that launches any script.
The HDFS target documentation is here:
https://luigi.readthedocs.io/en/stable/api/luigi.contrib.hdfs.html
https://luigi.readthedocs.io/en/stable/api/luigi.contrib.hdfs.target.html