Search code examples
hadoopsqoopcloudera-cdhsqoop2

Configure Sqoop2 TEXT_FILE Output Format


I'm using Sqoop2 (Sqoop 1.99.3-cdh5.1.0) to import data from a postgresql database. The job successfully completes and creates text files in HDFS. The output files are CSV with single quotes, I would like to configure the output to be tab separated without quotes.

Is output format of Sqoop2 configurable?


Solution

  • I had same problem, so I ended up with Sqoop1. Sqoop2 is great but has some disadvantages like:

    • you can not plan Sqoop2 jobs in Oozie, so you can run it manually only.
    • you can not load data directly into Hive or HBase, only to file.
    • you can not configure output delimiters and enclosures.

    So I recommend you to use sqoop1, it is quite easy :

    sqoop import --connect xxxx --username xxxx --password xxxx --query select * from xxx --target-dir /tmp/xxx -m 1 --fields-terminated-by | --enclosed-by \0 
    

    If you are using Hue jobs, it is better not to write this command into Command field of Sqoop job, but insert every command into separate Params (like first param is import, second --connect, third connection string then --username etc.).

    Hope it helps and good luck