Search code examples
amazon-web-servicesamazon-redshiftamazon-redshift-spectrum

See all files in S3 bucket using Redshift Spectrum


We have S3 buckets which are nested folder structure like TeamName/Year/Month/Day/<Parquet files 1 - n>.

We are trying to create a Redshift spectrum (using Glue data catalog) on the S3 folder and query data in Redshift. With all the tutorials I have seen so far, it works with the file directly under the root folder. So how do we see multiple files in redshift that are in the bucket with nested folders?

Also, if we add more files or folder e.g. Day2/ParquetFiles, will Spectrum be able to detect this? Is there a way to create spectrum on the root folder? The schema of all files will be same.


Solution

  • It should just read any files in the given path, including subdirectories.

    Yes, you can add additional files anywhere in that path and they should be included.

    From Creating external tables for Redshift Spectrum - Amazon Redshift:

    The external table statement defines the table columns, the format of your data files, and the location of your data in Amazon S3. Redshift Spectrum scans the files in the specified folder and any subfolders.