Search code examples
hadoopnetezzasqoop

Sqoop Direct Import Netezza Table Permissions


We are using netezza direct to import data from Netezza to Hadoop as part of POC.

Have couple of questions on Netezza specific and Netezza Sqoop Integration.

Q1. Does Sqoop direct mode always require CREATE EXTERNAL TABLE and DROP privilege to perform direct transfer?

Q2. Does external table get created in Netezza ? If yes, which database ? I see Sqoop using below query :

CREATE EXTERNAL TABLE '/yarn/local/usercache/someuser/appcache/application_1483624176418_42787/work/task_1483624176418_42787_m_000000/nzexttable-0.txt' 
USING (REMOTESOURCE 'JDBC'  
BOOLSTYLE 'T_F' 
 CRINSTRING FALSE  DELIMITER 44 ENCODING 
'internal'  FORMAT 'Text'  INCLUDEZEROSECONDS TRUE  
NULLVALUE 'null'  MAXERRORS 1) 
AS SELECT * FROM SOME_TBL WHERE (DATASLICEID % 3)

Does it create in Database selected in db URL ? jdbc:netezza://somehostname:5480/SOME_DB_1

Q3. If Netezza needs to create External tables, can it create the external table in different database than the one which the actual table with data that needs to be pulled into Hadoop. What is the config change that needs to be done ?

Q4. Does Sqoop run DROP table on external table which was created by individual mappers ?

Sqoop command Used :

export HADOOP_CLASSPATH=/opt/nz/lib/nzjdbc3.jar
sqoop import -D mapreduce.job.queuename=some_queue 
-D yarn.nodemanager.local-dirs=/tmp -D mapreduce.map.log.level=DEBUG 
--direct --connect jdbc:netezza://somehost:5480/SOME_DB --table SOME_TBL_1 
--username SOMEUSER --password xxxxxxx --target-dir /tmp/netezza/some_tbl_file 
--num-mappers 2 --verbose   

Solution

  • This is what I got as reply in Sqoop User community (Thanks Szabolcs Vasas).

    In case of Netezza direct imports Sqoop executes a CREATE EXTERNAL TABLE command (so you will need CREATE EXTERNAL TABLE privilege) to create a backup of the content of the table to a temporary file and it copies the content of this file to the final output on HDFS. The SQL command you pasted in your email is indeed the one which is executed by Sqoop but as far as I understand from the Netezza documentation (http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html, 6th example) this does not really create a new external table in any schema it just backs up the content of the table and because of that no DROP TABLE statement is executed.

    Q1. Yes, Sqoop need CREATE EXTERNAL TABLE but not DROP privilege.

    Q2. Sqoop does not really create a new external table in any schema it just backs up the content of the table (http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_expls.html, 6th example).

    Q3. Not possible to create an EXTERNAL table in a specific schema.

    Q4. No, Sqoop does not run DROP command.

    Moreover, the table created by sqoop direct process is Netezza TET - Transient external tables. Thus, the external remotesource JDBC table is dropped once the mapper receives the data as NamedFifo. Thus tables are not stored in Netezza after the transfer.