Search code examples
daskdask-distributeddask-jobqueue

How can I keep a PBSCluster running?


I have access to a cluster running PBS Pro and would like to keep a PBSCluster instance running on the headnode. My current (obviously broken) script is:

import dask_jobqueue

from paths import get_temp_dir


def main():
    temp_dir = get_temp_dir()
    scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
    cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)


if __name__ == '__main__':
    main()

This script is obviously broken because after the cluster is created the main() function exits and the cluster is destroyed. I imagine I must call some sort of execute_io_loop function, but I can't find anything in the API.

So, how can I keep my PBSCluster alive?


Solution

  • I'm thinking that the section of the Python API (advanced) in the docs might be a good way to try to solve this issue.

    Mind you this is an example of how to create Schedulers and Workers, but I'm assuming that the logic could be used in a similar way for your case.

    import asyncio
    
    async def create_cluster():
        temp_dir = get_temp_dir()
        scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
        cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)
    
    if __name__ == "__main__":
        asyncio.get_event_loop().run_until_complete(create_cluster())
    

    You might have to change the code a bit, but it should keep your create_cluster running until it finished.

    Let me know if this works for you.