The below code ignores the header in the csv file and the schema is being inferred as c1,c2,c3. The header columns are produced as just another data row. Below is sample data of relevant file
Series_reference Period Data_value Suppressed STATUS UNITS Magnitude Subject Group Series_title_1
BDCQ.SEA1AA 2011.06 80078 F Number 0 Business Data Collection - BDC Industry by employment variable Filled jobs
create or replace file format mybusinesscsv
type = csv
RECORD_DELIMITER = '\n'
skip_header = 0;
SELECT *
FROM TABLE(
INFER_SCHEMA(
LOCATION=>'@quarterdatastg/employment_1.csv.gz',
file_format=>'mybusinesscsv'
)
);
The point Nick is pointing to is the 'parse header'
option also is required.
we had two file formats, one for the LOAD and the other for the INFER stage:
create or replace file format DB_NAME.SCHEMA_NAME.CSV_FORMAT
type = CSV RECORD_DELIMITER ='\n', FIELD_DELIMITER=',',
SKIP_HEADER=1, PARSE_HEADER=false, ESCAPE_UNENCLOSED_FIELD='\\',
FIELD_OPTIONALLY_ENCLOSED_BY='\"';
create or replace file format DB_NAME.SCHEMA_NAME.CSV_FORMAT_HEADER
type = CSV RECORD_DELIMITER ='\n', FIELD_DELIMITER=',',
SKIP_HEADER=0, PARSE_HEADER=true, ESCAPE_UNENCLOSED_FIELD='\\',
FIELD_OPTIONALLY_ENCLOSED_BY='\"';
thus the INFER was like:
FROM TABLE(INFER_SCHEMA(
LOCATION => ''$$ || S3_FOLDER || $$'',
FILE_FORMAT => ''DB_NAME.SCHEMA_NAME.CSV_FORMAT_HEADER'',
IGNORE_CASE => TRUE
))
and where used in COPY's like:
COPY INTO DB_NAME.SCHEMA_NAME.TABLE_NAME
FROM @DB_NAME.SCHEMA_NAME.MEANINGFUL_NAME_STAGE/table_path_tokens
FILE_FORMAT = DB_NAME.SCHEMA_NAME.CSV_FORMAT;