Seeing very strange behavior where when when using gnu parallel
to distribute export jobs using bcp
from mssql-tools
. It appears that when using the -q
option for parallel
, strings are interpreted differently on local host than on remote hosts.
Running only as a loop through files on local host, the bcp processes throws no errors
However, distributing the file exports with parallel
, the bcp processes executing on the local host throw
/opt/mssql-tools/bin/bcp: unknown option
errors, while those executing on remote hosts (via a --sshloginfile
param) finish successfully. The basic code being run looks like...
# setting some vars to pass
TO_SERVER_ODBCDSN="-D -S MyMSSQLServer"
TO_SERVER_IP="-S 172.18.54.22"
DB="$dest_db" #TODO: enforce being more careful with this value
TABLE="$tablename" # MUST exist beforehand, case matters
USER=$(tail -n+1 $source_home/mssql-creds.txt | head -1)
PASSWORD=$(tail -n+2 $source_home/mssql-creds.txt | head -1)
DATAFILES="/some/path/to/files/"
TARGET_GLOB="*.tsv"
RECOMMEDED_IMPORT_MODE='-c' # makes a HUGE difference, see https://stackoverflow.com/a/16310219/8236733
DELIMITER="\\\t" # (currently not used) DO NOT use format like "'\t'", nested quotes seem to cause hard-to-catch error, want "\t" literal
....
bcpexport() {
filename=$1
TO_SERVER_ODBCDSN=$2
DB=$3
TABLE=$4 # MUST exist beforehand, case matters
USER=$5
PASSWORD=$6
RECOMMEDED_IMPORT_MODE=$7 # makes a HUGE difference, see https://stackoverflow.com/a/16310219/8236733
DELIMITER=$8 # not currently used
WORKDIR=$9
LOGDIR=${10}
....
/opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" \
$TO_SERVER_ODBCDSN \
-U $USER -P $PASSWORD \
-d $DB \
$RECOMMEDED_IMPORT_MODE
-t "\t" \
-e ${localfile}.bcperror.log
}
export -f bcpexport
parallelization_pernode=5
parallel -q -j $parallelization_pernode \
--sshloginfile $source_home/parallel-nodes.txt \
--env bcpexport \
bcpexport {} "$TO_SERVER_ODBCDSN" $DB $TABLE $USER $PASSWORD $RECOMMEDED_IMPORT_MODE $DELIMITER $workingdir $logdir \
::: $DATAFILES/$TARGET_GLOB #from hdfs nfs gateway
Looking at the bash interpretation of the processes (by running ps -aux | grep bcp
on the hosts that parallel
is given in the --sshloginfile
) for the remote hosts we see...
/bin/bash -c bcpexport() { ... /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" $TO_SERVER_ODBCDSN -U $USER -P $PASSWORD -d $DB $RECOMMEDED_IMPORT_MODE; -t "\t" -e ${localfile}.bcperror.log; ...
for the local host, the bash interpretation is...
/bin/bash -c bcpexport() { ... /opt/mssql-tools/bin/bcp "$TABLE" in "$localfile" $TO_SERVER_ODBCDSN -U $USER -P $PASSWORD -d $DB $RECOMMEDED_IMPORT_MODE; -t "\t" -e ${localfile}.bcperror.log; ...
that is, they look the same.
My current thought is that the "\t" in the bcp command is being interpreted in a problematic way. Debugging parallel
without vs with the -q
option we see...
$ parallel -j 5 --sshloginfile ./parallel-nodes.txt echo "Number {}: Running on \`hostname\`: \t" ::: 1 2 3 4 5
Number 4: Running on HW04.ucera.local: t
Number 1: Running on HW04.ucera.local: t
Number 2: Running on HW03.ucera.local: t
Number 5: Running on HW03.ucera.local: t
Number 3: Running on HW02.ucera.local: t
$ parallel -q -j 5 --sshloginfile ./parallel-nodes.txt echo "Number {}: Running on \`hostname\`: \t" ::: 1 2 3 4 5
Number 1: Running on `hostname`:
Number 4: Running on `hostname`:
Number 3: Running on `hostname`: \t
Number 2: Running on `hostname`: \t
Number 5: Running on `hostname`: \t
The bcp command needs the "\t" literal not the "t" literal (and I suspect several other similar string corruptions (also I do believe that \t is the default for bcp anyway, but this is just an example and want to keep \t for code clarity)), but not sure how to get this for both local and remote nodes or even why this behavior differs by remote vs local.
Basically, need the the strings to be exactly the same for both local and remote hosts even if strings have spaces or escape characters in them (note, I think this used to not be the case (have older script on other machines that don't have this problem))
Not sure if this is counts more as a parallel
problem or a bcp
problem (currently thinking something is going wrong with the -q
option in parallel
, but not sure). Anyone have any debugging suggestions or fixes? Ideas of what could be happening?
Firstly, the reason why hostname
is not expanded is due to -q
. It quotes the ` so that it does not expand.
Secondly, I think what you see is the different behaviours in built-in echo
and /bin/echo
. Built-in echo
depends on the shell. Here I compare echo \\\\t
in different shells:
$ parallel --onall --tag -S sh@lo,bash@lo,csh@lo,tcsh@lo,ksh@lo,zsh@lo echo \\\\t ::: a
bash@lo \t a
tcsh@lo a
sh@lo a
ksh@lo \t a
zsh@lo a
csh@lo \t a
That does not, however, get you closer to a solution. If I were you I would use env_parallel
to copy the environment variables. And if the login shell on the remote systems are not the same as your shell, then set PARALLEL_SHELL
to force using that shell.
So:
#!/bin/bash
env_parallel --session
# setting some vars to pass
TO_SERVER_ODBCDSN="-D -S MyMSSQLServer"
:
:
PARALLEL_SHELL=bash env_parallel -q -j $parallelization_pernode ...
(no need to use neither --env nor 'export -f' when using 'env_parallel --session')
# Cleanup (not needed if this is the last line in the script)
env_parallel --end-session