Search code examples
amazon-web-servicesamazon-s3amazon-redshiftmanifest

Loading redshift table from multiple s3 folder using manifest


I am using copy command to load a Redshift table from s3 using manifest.

The requirement is to load multiple files ( across various folders ) for e.g

Path1 : s3://bucket_name/folder_name/folder_1/folder/part*.parquet
Path2 : s3://bucket_name/folder_name/folder_2/folder/part*.parquet
Path3 : s3://bucket_name/folder_name/folder_3/folder/part*.parquet

each path will have ~1000 files

How do I create a manifest to load this ?

I created a manifest as follows :

{
    "fileLocations": [ 
{"url":"s3://bucket_name/folder_name/folder_1/folder/part*.parquet", "mandatory":false},

 {"url":"s3://bucket_name/folder_name/folder_3/folder/part*.parquet", "mandatory":false},

 {"url":"s3://bucket_name/folder_name/folder_2/folder/part*.parquet", "mandatory":false},

 ]
}

but I am getting an error:

Manifest does not contain a list of files.


Solution

  • From Using a manifest to specify data files - Amazon Redshift:

    The following example shows the JSON to load files from different buckets and with file names that begin with date stamps:

    {
      "entries": [
        {"url":"s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
        {"url":"s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
        {"url":"s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
        {"url":"s3://mybucket-beta/2013-10-05-custdata", "mandatory":true}
      ]
    }
    

    The problem is probably your use of fileLocations vs entries.

    I also suspect that the use of wildcards is not permitted.