Search code examples
bashsshgoogle-bigquerydouble-quotessingle-quotes

wrapping commands through ssh: how to manage complex quotes?


I use an HPC cluster. The compute nodes can't have access to internet, only the frontal.

So I want to wrap all the commands that need to access internet in order to execute them on the frontal.

ex: for wget

#!/bin/bash
ssh frontal /bin/wget "$@"

-> works fine

I have to wrap this bq (google BigQuery) command: bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"

I managed to requote the command and to launch it successfully on CLI: ssh frontal '~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '"'"'2016%'"'"' AND mgrs_tile == '"'"'32ULU'"'"' ORDER BY sensing_time ASC LIMIT 1000;"'

Now I want to write a wrapper named bq able to get the parameters and launch this command through ssh ... here is what i have tried :

#!/bin/bash
set -eu

# all parameters in an array
args=("$@")

# unset globing (there's a * in the SELECT clause)
set -f

# managing inner quotes
arg2=`echo "${args[2]}" | perl -pe 's/'\''/'\''"'\''"'\''/g'`

# put back double quotes (") suppressed by bash
args="${args[0]} ${args[1]} \"${arg2}\""

# build command with parameters
cmd="~/downloads_and_builds/builds/google-cloud-sdk/bin/bq $args"

echo ""
echo "command without external quotes"
echo "$cmd"
echo ""

echo "testing it ..."
ssh hpc-login1 "$cmd"
echo ""

# wrapping command between simple quotes (like on the CLI)
cmd="'"'~/downloads_and_builds/builds/google-cloud-sdk/bin/bq '"$args""'"
echo "commande with external quotes"
echo "$cmd"
echo ""

echo "testing it ..."
ssh hpc-login1 $cmd
echo "done"

Here is the output of this script: $ bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"

command without external quotes
~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '"'"'2016%'"'"' AND mgrs_tile == '"'"'32ULU'"'"' ORDER BY sensing_time ASC LIMIT 1000;"

testing it ...
Waiting on bqjob_r102b0c22cdd77c2d_000001629b8391a3_1 ... (0s) Current status: DONE   

commande with external quotes
'~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '"'"'2016%'"'"' AND mgrs_tile == '"'"'32ULU'"'"' ORDER BY sensing_time ASC LIMIT 1000;"'

testing it ...
bash: ~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;": Aucun fichier ou dossier de ce type (in english: no file or directory of this kind)

As you can see, I managed to get a correct command string, just like the one which works on CLI, but it doesn't work in my script:

  1. The first attempt succeeded but gives no output (I have tried to redirect it in a file: the file were created but is empty)
  2. In the second attempt (with external simple quotes, just like the CLI command that worked), bash take the quoted arg as a block and don't find the command ...

Has somebody an idea on how to launch a complex command (with quotes, wildcards ...) like this one through ssh using a wrapper script ?

(ie. one wrapper named foo able to replace a foo command and execute it correctly through ssh with the arguments provided)


Solution

  • ssh has the same semantics as eval: all arguments are concatenated with spaces and then evaluated as a shell command.

    You can have it work with execve semantics (like sudo) by having a wrapper escape the arguments:

    remotebq() { 
      ssh yourhost "~/downloads_and_builds/builds/google-cloud-sdk/bin/bq $(printf '%q ' "$@")"
    }
    

    This quotes thoroughly and consistently, so you no longer have to worry about adding additional escaping. It'll run exactly what you tell it (as long as your remote shell is bash):

    remotebq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"
    

    However, the downside to running exactly what you tell it is that now you need to know exactly what you want to run.

    For example, you can no longer pass '~/foo' as an argument because this is not a valid file: ~ is a shell feature and not a directory name, and when it's correctly escaped it will not be replaced by your home directory.