Search code examples
bcpgnu-parallel

GNU Parallel -q option causing BCP "unknown option" errors (different string quotes on local vs remote hosts)


Seeing very strange behavior where when when using gnu parallel to distribute export jobs using bcp from mssql-tools. It appears that when using the -q option for parallel, strings are interpreted differently on local host than on remote hosts.

Running only as a loop through files on local host, the bcp processes throws no errors

However, distributing the file exports with parallel, the bcp processes executing on the local host throw

/opt/mssql-tools/bin/bcp: unknown option

errors, while those executing on remote hosts (via a --sshloginfile param) finish successfully. The basic code being run looks like...

# setting some vars to pass
TO_SERVER_ODBCDSN="-D -S MyMSSQLServer"
TO_SERVER_IP="-S 172.18.54.22"
DB="$dest_db" #TODO: enforce being more careful with this value
TABLE="$tablename" # MUST exist beforehand, case matters
USER=$(tail -n+1 $source_home/mssql-creds.txt | head -1)
PASSWORD=$(tail -n+2 $source_home/mssql-creds.txt | head -1)
DATAFILES="/some/path/to/files/"
TARGET_GLOB="*.tsv"
RECOMMEDED_IMPORT_MODE='-c' # makes a HUGE difference, see https://stackoverflow.com/a/16310219/8236733
DELIMITER="\\\t" # (currently not used) DO NOT use format like "'\t'", nested quotes seem to cause hard-to-catch error, want "\t" literal

....

bcpexport() {
    filename=$1
    TO_SERVER_ODBCDSN=$2
    DB=$3
    TABLE=$4 # MUST exist beforehand, case matters
    USER=$5
    PASSWORD=$6
    RECOMMEDED_IMPORT_MODE=$7 # makes a HUGE difference, see https://stackoverflow.com/a/16310219/8236733
    DELIMITER=$8 # not currently used
    WORKDIR=$9
    LOGDIR=${10}

    ....

    /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" \
        $TO_SERVER_ODBCDSN \
        -U $USER -P $PASSWORD \
        -d $DB \
        $RECOMMEDED_IMPORT_MODE
        -t "\t" \
        -e ${localfile}.bcperror.log
}

export -f bcpexport
parallelization_pernode=5
parallel -q -j $parallelization_pernode \
        --sshloginfile $source_home/parallel-nodes.txt \
        --env bcpexport \
        bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER $workingdir $logdir \
        ::: $DATAFILES/$TARGET_GLOB  #from hdfs nfs gateway

Looking at the bash interpretation of the processes (by running ps -aux | grep bcp on the hosts that parallelis given in the --sshloginfile) for the remote hosts we see...

/bin/bash -c bcpexport() { ... /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" $TO_SERVER_ODBCDSN -U $USER -P $PASSWORD -d $DB $RECOMMEDED_IMPORT_MODE;  -t "\t" -e ${localfile}.bcperror.log; ...

for the local host, the bash interpretation is...

/bin/bash -c bcpexport() { ... /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" $TO_SERVER_ODBCDSN -U $USER -P $PASSWORD -d $DB $RECOMMEDED_IMPORT_MODE;  -t "\t" -e ${localfile}.bcperror.log; ...

that is, they look the same.

My current thought is that the "\t" in the bcp command is being interpreted in a problematic way. Debugging parallel without vs with the -q option we see...

$ parallel -j 5 --sshloginfile ./parallel-nodes.txt echo "Number {}: Running on \`hostname\`: \t" ::: 1 2 3 4 5              
Number 4: Running on HW04.ucera.local: t
Number 1: Running on HW04.ucera.local: t
Number 2: Running on HW03.ucera.local: t
Number 5: Running on HW03.ucera.local: t
Number 3: Running on HW02.ucera.local: t
$ parallel -q -j 5 --sshloginfile ./parallel-nodes.txt echo "Number {}: Running on \`hostname\`: \t" ::: 1 2 3 4 5           
Number 1: Running on `hostname`:    
Number 4: Running on `hostname`:    
Number 3: Running on `hostname`: \t
Number 2: Running on `hostname`: \t
Number 5: Running on `hostname`: \t

The bcp command needs the "\t" literal not the "t" literal (and I suspect several other similar string corruptions (also I do believe that \t is the default for bcp anyway, but this is just an example and want to keep \t for code clarity)), but not sure how to get this for both local and remote nodes or even why this behavior differs by remote vs local.

Basically, need the the strings to be exactly the same for both local and remote hosts even if strings have spaces or escape characters in them (note, I think this used to not be the case (have older script on other machines that don't have this problem))

Not sure if this is counts more as a parallel problem or a bcp problem (currently thinking something is going wrong with the -q option in parallel, but not sure). Anyone have any debugging suggestions or fixes? Ideas of what could be happening?


Solution

  • Firstly, the reason why hostname is not expanded is due to -q. It quotes the ` so that it does not expand.

    Secondly, I think what you see is the different behaviours in built-in echo and /bin/echo. Built-in echo depends on the shell. Here I compare echo \\\\t in different shells:

    $ parallel --onall --tag -S sh@lo,bash@lo,csh@lo,tcsh@lo,ksh@lo,zsh@lo echo  \\\\t ::: a 
    bash@lo \t a
    tcsh@lo          a
    sh@lo    a
    ksh@lo \t a
    zsh@lo   a
    csh@lo \t a
    

    That does not, however, get you closer to a solution. If I were you I would use env_parallel to copy the environment variables. And if the login shell on the remote systems are not the same as your shell, then set PARALLEL_SHELL to force using that shell.

    So:

    #!/bin/bash
    
    env_parallel --session
    
    # setting some vars to pass
    TO_SERVER_ODBCDSN="-D -S MyMSSQLServer"
    :
    :
    PARALLEL_SHELL=bash env_parallel -q -j $parallelization_pernode ...
    (no need to use neither --env nor 'export -f' when using 'env_parallel --session')
    
    # Cleanup (not needed if this is the last line in the script)
    env_parallel --end-session