Search code examples
hadoophivesqoophadoop2hcatalog

Sqoop incremental export using hcatalog?


Is there a way to use sqoop to do incremental exports ? I am using Hcatalog integration for sqoop.I tried using the --last-value, --check-column options which are used for incremental import, but sqoop gave me error that the options were invalid.


Solution

  • I have not seen incremental sqoop export arguments. The other way you could try is to create a contol_table in hive where you keep log of the table name & timestamp when it was last exported every time.

    create table if not exists control_table (
     table_name  string,
     export_date timestamp
    );
    
    insert into control_table 'export_table1' as table_name, from_unixtime(unix_timestamp()) as export_date from control_table;
    

    If export_table1 is the table you want to export incrementally and assuming if have already executed above two statements.

    --execute below at once   
    --get the timestamp when the table was last executed
    create temporary table control_table_now as select table_name, max(export_date) as last_export_date from control_table group by table_name;
    
    --get incremental rows
    create table new_export_table1 as select field1, field2, field3, .... timestamp1 from export_table1 e, control_table_now c where c.table_name = 'export_table1' and e.timestamp1 >= c.last_export_date;
    
    --append the control_table for next process
    insert into control_table 'export_table1' as table_name, from_unixtime(unix_timestamp()) as export_date from control_table;
    

    Now, export the new_export_table1 table which is incrementally created using sqoop export command.