Search code examples
pythonray

How to cancel a Ray job submitted to a Ray cluster?


I have a long-run Ray job.

main.py

import time

import ray


@ray.remote
def square(n: int) -> int:
    time.sleep(50000000)
    return n * n


@ray.remote
def sum_list(numbers: list[int]) -> int:
    return sum(numbers)


if __name__ == "__main__":
    ray.init()

    numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    squared_tasks = [square.remote(n) for n in numbers]
    squared_results: list[int] = ray.get(squared_tasks)
    print(f"{squared_results = }")

    sum_task = sum_list.remote(squared_results)
    total_sum = ray.get(sum_task)
    print(f"{total_sum = }")

    ray.shutdown()

submit.py

from ray.job_submission import JobSubmissionClient

client = JobSubmissionClient("https://ray.example.com")
client.submit_job(
    entrypoint="python src/main.py",
    runtime_env={
        "working_dir": "./",
    },
)

I submitted this job by calling python src/submit.py.

How to cancel this Ray job?

I saw a question asked here for over a year, but there is no answer.

Thanks!


Solution

  • I figured out.

    Basically, assuming you want to cancel a job with ID 06000000.

    enter image description here

    You can cancel the job by ID using

    ray job stop --address=https://ray.example.com 06000000
    

    You can find more parameters at https://docs.ray.io/en/latest/cluster/cli.html#ray-stop