Search code examples
clickhouselz4

Export Clickhouse table to s3 without using LIMIT


I'm trying to export Clickhouse table to s3 following official docs and using command:

INSERT INTO FUNCTION
   s3(
       'https://my-bucket.s3.amazonaws.com/files/table.csv.lz4',
       'aws_key',
       'aws_secret',
       'CSV'
    )
SELECT *
FROM table
SETTINGS max_execution_time = 20

but if I don't set LIMIT, or if I set it little bit over 10000, I get following error:

Code: 617. DB::Exception: LZ4 failed to encode stream. LZ4F version: 100. (LZ4_ENCODER_FAILED) (version 23.5.3.24 (official build))

My table is not too big, it has ~ 120k rows and ~10 columns. If I set LIMIT 10000 then it works fine. Same thing happens when I do export with partitions but each partition is larger than ~10000 rows.

I'm using Clickhouse version 23.5.3.24 from official docker image and default config.

Question: Is there any limit of how big these exports / partitions must be and if yes, are there any docs about this that I missed? How to know which size of export / partitions to make? Is there anything in configuration that should be set for this?

Thanks!


Solution

  • Do you need csv.lz4 or is it ok to have Parquet as well? You can probably overcome it by using snappy compression. In some previous Clickhouse versions, the default format was snappy and then it was changed to LZ4 and it is possibly not fine-tuned well.

    Here you can have get more context

    For example, in new version of Clickhouse I use following setting

    output_format_parquet_compression_method='snappy'
    

    In my config I'm using following settings in my config file and it is working

    clickhouse:
      profiles:
        default/max_query_size: 1000000
        default/output_format_parquet_string_as_string: true
        default/output_format_parquet_compression_method: "snappy"
    

    And exporting data to Parquet files with ENGINE

    ENGINE = S3('{s3_path}/{file_name}.parquet', 'Parquet');
    

    I also recommended to use such potentially big queries directly through clickhouse-client.