I'm trying to set up a cluster of EC2 instances that I can use for "big data" modeling. I'm trying to stay close to the metal and would prefer to not use existing frameworks (e.g. dask, DataBricks, etc).
I am able to stand up the instances using boto3, and I'm using paramiko to SSH and copy files to each one. I have written a TCP server that I'd like to launch on each worker machine, but when I issue the command (in BASH) to launch the server my session hangs waiting for the server to complete. Below is a somewhat stripped down version of what I'm trying to do. I'd like to loop over instances in a for
loop starting a server on each one with a call to start_server
.
import paramiko
import os
def start_server(instance, server_code_fname: str, port: int, rsa_key):
"""
Args:
instance: A boto3 'Instance' handle to a machine running on EC2.
server_code_fname: The name of a file containing the TCP server code to
run on the remote machine.
port: The port the remote machine should listen on.
rsa_key: Security token needed to make the SSH connection.
Expected Effect: Should start a server on the remote machine and return
fairly quickly.
Observed Effect: Server starts, but the SSH session hangs -- presumably
waiting (forever) for the server to complete, and blocking me from
launching the server on the NEXT machine.
"""
dest = "/tmp/" + str(os.path.basename(server_code_fname))
copy_local_file_to_remote(
instance,
local_path=server_code_fname,
remote_path=dest,
rsa_key=rsa_key)
ip = instance.private_ip_address
# ----------------------------------------------------------------------
# This is the command that hangs
command = f"nohup python3 {dest} --port {port} --ip {ip} & "
# ----------------------------------------------------------------------
run_bash_command(instance, command, rsa_key)
def copy_local_file_to_remote(instance,
local_path: str,
remote_path: str,
rsa_key):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(hostname=instance.public_dns_name,
username="ec2-user",
pkey=rsa_key,
port=22)
scp = ssh.open_sftp()
scp.put(local_path, remote_path)
scp.close()
def run_bash_command(instance, command, rsa_key):
ssh_client = paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.connect(hostname=instance.public_dns_name,
username="appropriate_username",
pkey=rsa_key)
_, stdout, stderr = ssh_client.exec_command(command)
ssh_client.close()
return stdout, stderr
Replacing the misbehaving command string with
command = f"python3 {dest} --port {port} --ip {ip} >/dev/null 2>&1 &"
does the trick. Thank you Martin Prikryl for the pointer to the solution.