Search code examples
csvclickhouse

Parse error with a simple CSV import into Clickhouse


I have an extremely simple CSV I'm trying to import into clickhouse with no success. The create table statement is:

CREATE TABLE staging.EloLBK
(
    `Month` DateTime64(3),
    `1958` Int32,
    `1959` Int32,
    `1960` Int32
)
ENGINE = MergeTree
PRIMARY KEY Month
ORDER BY Month
SETTINGS index_granularity = 8192

The CSV data looks like this:

"Month", "1958", "1959", "1960"
"JAN",  340,  360,  417
"FEB",  318,  342,  391
"MAR",  362,  406,  419
"APR",  348,  396,  461
"MAY",  363,  420,  472
"JUN",  435,  472,  535
"JUL",  491,  548,  622
"AUG",  505,  559,  606
"SEP",  404,  463,  508
"OCT",  359,  407,  461
"NOV",  310,  362,  390
"DEC",  337,  405,  432

My import statement is:

INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer')

The error coming back from clickhouse is:

Code: 27. DB::Exception: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,':
Row 1:
Column 0,   name: Month, type: DateTime64(3), parsed text: "<DOUBLE QUOTE>JAN<DOUBLE QUOTE>,  340,  360,  "ERROR
Code: 27. DB::ParsingException: Cannot parse input: expected '"' before: '417\n"FEB",  318,  342,  391\n"MAR",  362,  406,  419\n"APR",  348,  396,  461\n"MAY",  363,  420,  472\n"JUN",  435,  472,  535\n"JUL",  491,  548,  622\n"AUG",  505,'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 21.12.1.8928 (official build))

: While executing CSVRowInputFormat: While executing File. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)

I'm not sure how to resolve this so any advice would be appreciated!


Solution

  • You can tell ClickHouse to do a best effort guess by passing the option date_time_input_format='best_effort', for example:

    INSERT INTO EloLBK SELECT * FROM file('EloLBK/*.csv', 'CSVWithNames', '"Month" datetime64, "1958" integer, "1959" integer, "1960" integer') settings date_time_input_format='best_effort';
    

    Will lead to:

    ┌───────────────────Month─┬─1958─┬─1959─┬─1960─┐
    │ 2000-01-01 00:00:00.000 │  340 │  360 │  417 │
    │ 2000-02-01 00:00:00.000 │  318 │  342 │  391 │
    │ 2000-03-01 00:00:00.000 │  362 │  406 │  419 │
    │ 2000-04-01 00:00:00.000 │  348 │  396 │  461 │
    │ 2000-05-01 00:00:00.000 │  363 │  420 │  472 │
    │ 2000-06-01 00:00:00.000 │  435 │  472 │  535 │
    │ 2000-07-01 00:00:00.000 │  491 │  548 │  622 │
    │ 2000-08-01 00:00:00.000 │  505 │  559 │  606 │
    │ 2000-09-01 00:00:00.000 │  404 │  463 │  508 │
    │ 2000-10-01 00:00:00.000 │  359 │  407 │  461 │
    │ 2000-11-01 00:00:00.000 │  310 │  362 │  390 │
    │ 2000-12-01 00:00:00.000 │  337 │  405 │  432 │
    └─────────────────────────┴──────┴──────┴──────┘