I am trying to launch a ray cluster using the yaml file below, but I am getting this error message:
bash: /root/ray_bootstrap_config.yaml: Permission denied
I think it may be due to a permissions required to access my root folder locally from where I launch the cluster. If i go to this folder locally as shown in image, credentials are required when click on root: click here for image
there is some indictaion online that I need to do filemounting, but so far I have been unable to do this.
resource: https://github.com/ray-project/ray/issues/9326
The cluster launches initially, but this error occurs when running the yaml file. It connects to aws successfully luanching the head and worker nodes, first installs a few dependencies eg boto ect as shown in initilization_commands sucessfully, but then comes stuck on the error shown.
This is my Yaml:
# An unique identifier for the head node and workers of this cluster.
cluster_name: ray-pipeline-test #ray_example_aws
# The maximum number of workers nodes to launch in addition to the head
# node. This takes precedence over min_workers. min_workers default to 0.
max_workers: 1
docker:
image: "xxxxxxxx1546.dkr.ecr.eu-west-2.amazonaws.com/xxxxx/pipeline:ray-aws"
container_name: "ray_xxxxxxx_pipeline_aws" #"ray_nvidia_docker" # e.g. ray_docker
pull_before_run: True
idle_timeout_minutes: 5
# Cloud-provider specific configuration.
provider:
type: aws
region: eu-west-2
availability_zone: eu-west-2a
initialization_commands:
#- conda install python==3.6
# - wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh || true
# - bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p $HOME/anaconda3 || true
# - echo 'export PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bashrc
# - conda create -n py36 python=3.6 anaconda
#- wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# - sh Miniconda3-latest-Linux-x86_64.sh
- source .bashrc
- conda update conda -n base
- conda create -n py36 python=3.6
- conda activate py36
- curl -fsSL https://get.docker.com -o get-docker.sh
- sudo sh get-docker.sh
- sudo usermod -aG docker $USER
- sudo systemctl restart docker -f
- sudo apt-get update
- sudo apt-get upgrade
- sudo apt-get install -y python-setuptools
- sudo apt-get install -y build-essential curl unzip psmisc
- pip install boto boto3
- conda install boto boto3
- pip install awscli
- sudo pip install --default-timeout=100 future
- pip install ray==1.0.1.post1
- aws configure set aws_access_key_id xxxxxxxxxxx
- aws configure set aws_secret_access_key xxxxxxxxxxxxxxxxxxxxx
- eval $(aws ecr get-login --no-include-email --region eu-west-2)
auth:
ssh_user: ubuntu
ssh_private_key: /home/user/.ssh/aws_ubuntu_test.pem
head_node:
InstanceType: c5.2xlarge
ImageId: ami-xxxxxxxb31fd2c
KeyName: aws_ubuntu_test
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 200
worker_nodes:
InstanceType: c5.2xlarge
ImageId: ami-xxxxxxxxx31fd2c
KeyName: aws_ubuntu_test
When using a custom docker image with Ray, you should make sure it's based off of the rayproject/ray
image, because Ray's autoscaler has a lot of expectations about what's on the container, what user it will be run as, and what settings/optimizations it can change.