Export non-varchar data to CSV table using Trino (formerly PrestoDB)

I am working on some benchmarks and need to compare ORC, Parquet and CSV formats. I have exported TPC/H (SF1000) to ORC based tables. When I want to export it to Parquet I can run:

CREATE TABLE hive.tpch_sf1_parquet.region
  WITH (format = 'parquet')
  AS SELECT * FROM hive.tpch_sf1_orc.region

When I try the similar approach with CSV, then I get the error Hive CSV storage format only supports VARCHAR (unbounded). I would assumed that it would convert the other datatypes (i.e. bigint) to text and store the column format in the Hive metadata.

I can export the data to CSV using trino --server trino:8080 --catalog hive --schema tpch_sf1_orc --output-format=CSV --execute 'SELECT * FROM nation, but then it gets emitted to a file. Although this works for SF1 it quickly becomes unusable for SF1000 scale-factor. Another disadvantage is that my Hive metastores wouldn't have the appropriate meta-data (although I could patch it manually if nothing else works).

Anyone an idea how to convert my ORC/Parquet data to CSV using Hive?

Solution

In Trino Hive connector, the CSV table can contain varchar columns only.

You need to cast the exported columns to varchar when creating the table

CREATE TABLE region_csv
WITH (format='CSV')
AS SELECT CAST(regionkey AS varchar), CAST(name AS varchar), CAST(comment AS varchar)
FROM region_orc

Note that you will need to update your benchmark queries accordingly, e.g. by applying reverse casts.