Search code examples
amazon-web-servicesamazon-s3hiveamazon-redshift

loading Redshift from S3 (with partitions)


My S3 file (s3://MY_s3/stage/my_s3_file) has a partition on load_dt:

S3 Structure:

-MY_S3
 --stage
  ---my_s3_file
    ----load_dt=2016-02-01
         ----000024
    ----load_dt=2016-02-02
         ----000036

The actual files are under load_dt partitions like 000024 & 000036.

The COPY command works fine if I am not keeping load_dt in Redshift table, but when I am adding load_dt in Redshift table the COPY command fails due to data error as the input layout & target layout are not matching (with extra load_dt on target).

The hive table which creates the S3 file shows partitioned load_dt at the end.

How to make this COPY command work with load_dt being on target Redshift?

I even tried changing S3 locations to s3://MY_s3/stage/my_s3_file/load_dt but no luck.


Solution

  • I think I found the answer for my case.

    I was not able to load the hive partition because Hive store that partition value in Hive metadata , that's why the partition column was not there in S3 file.

    Now I am adding a new column to S3 through Hive as Load_Dt_New so the S3 file would have the required column for my Redshift COPY command to work.