Search code examples
pythonbashamazon-ec2boto3paramiko

How to start a server on a remote machine?


I'm trying to set up a cluster of EC2 instances that I can use for "big data" modeling. I'm trying to stay close to the metal and would prefer to not use existing frameworks (e.g. dask, DataBricks, etc).

I am able to stand up the instances using boto3, and I'm using paramiko to SSH and copy files to each one. I have written a TCP server that I'd like to launch on each worker machine, but when I issue the command (in BASH) to launch the server my session hangs waiting for the server to complete. Below is a somewhat stripped down version of what I'm trying to do. I'd like to loop over instances in a for loop starting a server on each one with a call to start_server.

import paramiko
import os


def start_server(instance, server_code_fname: str, port: int, rsa_key):
    """
    Args:
      instance: A boto3 'Instance' handle to a machine running on EC2.
      server_code_fname: The name of a file containing the TCP server code to
        run on the remote machine.
      port:  The port the remote machine should listen on.
      rsa_key:  Security token needed to make the SSH connection.


    Expected Effect: Should start a server on the remote machine and return
      fairly quickly.

    Observed Effect: Server starts, but the SSH session hangs -- presumably
      waiting (forever) for the server to complete, and blocking me from
      launching the server on the NEXT machine.
    """
    dest = "/tmp/" + str(os.path.basename(server_code_fname))
    copy_local_file_to_remote(
        instance,
        local_path=server_code_fname,
        remote_path=dest,
        rsa_key=rsa_key)

    ip = instance.private_ip_address
    # ----------------------------------------------------------------------
    # This is the command that hangs
    command = f"nohup python3 {dest} --port {port}  --ip {ip} & "
    # ----------------------------------------------------------------------
    run_bash_command(instance, command, rsa_key)


def copy_local_file_to_remote(instance,
                              local_path: str,
                              remote_path: str,
                              rsa_key):
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(hostname=instance.public_dns_name,
                username="ec2-user",
                pkey=rsa_key,
                port=22)
    scp = ssh.open_sftp()
    scp.put(local_path, remote_path)
    scp.close()


def run_bash_command(instance, command, rsa_key):
    ssh_client = paramiko.SSHClient()
    ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh_client.connect(hostname=instance.public_dns_name,
                       username="appropriate_username",
                       pkey=rsa_key)
    _, stdout, stderr = ssh_client.exec_command(command)
    ssh_client.close()
    return stdout, stderr

Solution

  • Replacing the misbehaving command string with

       command = f"python3 {dest} --port {port} --ip {ip} >/dev/null 2>&1 &"
    

    does the trick. Thank you Martin Prikryl for the pointer to the solution.