Search code examples
amazon-web-servicesamazon-s3amazon-redshiftamazon-redshift-spectrum

How to skip files with specific extension on Redshift external tables?


I have a partitioned location on S3 with data I want to read via Redshift External Table, which I create with the SQL statement CREATE EXTERNAL TABLE....

Only thing is that I have some metadata files within these partitions with, for example, extension .txt while the data I'm reading is .json.

Is it possible to inform Redshift to skip those files, in a similar manner to Glue Crawler exclude patterns? e.g. Glue crawler exclude patterns


Solution

  • Can you try using the pseudocolumns in the SQL and excluding based on the path name? https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE_usage.html

    create external table as ....
    
    select .....
    where "$path" like '%.json'