Search code examples
hdfsbcp

Using mssql-tools bcp from HDFS NFS mount


Trying to run bcp from the mssql-tools package (using centos7) to export tsv file data from an HDFS location mounted to local FS via NFS Gateway, but running into errors like...

SQLState = S1000, NativeError = 0

Error = [Microsoft][ODBC Driver 17 for SQL Server]Unable to open BCP error-file

or

SQLState = S1000, NativeError = 0

Error = [Microsoft][ODBC Driver 17 for SQL Server]Unable to open BCP host data-file

The bcp command being run looks like...

/opt/mssql-tools/bin/bcp "$TABLE" in \
        "$filename" \
        $TO_SERVER_ODBCDSN \
        -U $USER -P $PASSWORD \
        -d $DB \
        $RECOMMEDED_IMPORT_MODE \
        -t "\t" \
        -e ${filename}.bcperror.log

# with the actual commmand w/ variables resolved looks like...
/opt/mssql-tools/bin/bcp "ACCOUNT" in \
    "/HDFS_NFS/path/to/tsv/1_0_0.tsv" \
    -D -S MyMSSQLServer \
    -U myuser -P mypassword \
    -d SOME_MSSQL_DB \
    -c \
    -t \t \
    -e /HDFS_NFS/path/to/store/errlogs/1_0_0.tsv.bcperror.log

all of this seems fine to me, yet also sometimes getting errors like...

/opt/mssql-tools/bin/bcp: unknown option

usage: /opt/mssql-tools/bin/bcp {dbtable | query} {in | out | queryout | format} datafile ...

so not sure what that's about either. My /etc/odbc.ini file looks like...

[MyMSSQLServer]
Driver=ODBC Driver 17 for SQL Server
Description=My MS SQL Server
Trace=No
Server=<the server's IP>

Anyone know any further debugging tips or fixes for this?


Solution

  • The problem appears to be that the error logging file specified by the -e option was already existing in the location specified and that HDFS (mounted via NFS or not) did not like the bcp command trying to overwrite it. You would normally do something like

    hadoop fs -put -f /some/local/file /hdfs/location/for/file
    

    and I assume that bcp was attempting something else via the NFS gateway that was not this. I suppose there also could have been latency problems with bcp accessing the HDFS NFS location. Running the bcp command without the -e option worked in the example originally posted.

    ** As a workaround, based on another SO post, I bring the files down (hadoop fs -get ...) to a local temp dir /home/user/tmp/<some uuid>/ and do what needs to be done there, then hadoop fs -put ....