amazon-web-services amazon-redshift amazon-redshift-spectrum

Skipping header rows in AWS Redshift External Tables

I have a file in S3 with the following data:

name,age,gender
jill,30,f
jack,32,m

And a redshift external table to query that data using spectrum:

create external table spectrum.customers ( 
 "name" varchar(50),
 "age" int,
 "gender" varchar(1))
row format delimited
fields terminated by ','
lines terminated by \n'
stored as textfile 
location 's3://...';

When querying the data I get the following result:

select * from spectrum.customers;
name,age,g
jill,30,f
jack,32,m

Is there an elegant way to skip the header row as part of the external table definition, similar to the tblproperties ("skip.header.line.count"="1") option in Hive? Or is my only option (at least for now) to filter out the header rows as part of the select statement?

Solution

Answered this in: How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

This works in Redshift:

You want to use table properties ('skip.header.line.count'='1') Along with other properties if you want, e.g. 'numRows'='100'. Here's a sample:

create external table exreddb1.test_table
(ID BIGINT 
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');