I have a file in S3 with the following data:
name,age,gender
jill,30,f
jack,32,m
And a redshift external table to query that data using spectrum:
create external table spectrum.customers (
"name" varchar(50),
"age" int,
"gender" varchar(1))
row format delimited
fields terminated by ','
lines terminated by \n'
stored as textfile
location 's3://...';
When querying the data I get the following result:
select * from spectrum.customers;
name,age,g
jill,30,f
jack,32,m
Is there an elegant way to skip the header row as part of the external table definition, similar to the tblproperties ("skip.header.line.count"="1")
option in Hive? Or is my only option (at least for now) to filter out the header rows as part of the select statement?
Answered this in: How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.
This works in Redshift:
You want to use table properties ('skip.header.line.count'='1')
Along with other properties if you want, e.g. 'numRows'='100'
.
Here's a sample:
create external table exreddb1.test_table
(ID BIGINT
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');