Search code examples
bashclickhousexargs

Clickhouse-client Code: 36. DB::Exception: Positional options are not supported. (BAD_ARGUMENTS) bash script


Here is my bash script for inserting parquets in parallel to clickhouse. It keeps giving me the error I put in the title though and I don't know why. Any help is appreciated

#!/bin/bash
time (for FILENAME in /mnt/sdc/traces/part-*.snappy.parquet; do
            echo $FILENAME
            xargs -P 6 -n 1 -0 clickhouse-client --receive_timeout=100000 --query=\"INSERT INTO ethereum.traces FORMAT Parquet\" < $FILENAME
        done)

Solution

  • One way to implement this would look like:

    #!/bin/bash
    cpu_count=6
    batch_size=4
    
    printf '%s\0' /mnt/sdc/traces/part-*.snappy.parquet |
      xargs -P"$cpu_count" -n"$batch_size" -0 sh -c '
        for filename in "$@"; do
          echo "$filename"
          clickhouse-client --receive_timeout=100000 --query="INSERT INTO ethereum.traces FORMAT Parquet" <"$filename"
        done
      ' _
    
    • xargs requires its stdin to be a list of arguments to pass to the program it invokes. That wasn't the case at all in your original code, which was passing xargs parquet files directly on its stdin -- whereas here, we're passing it a NUL-delimited list of names of parquet files.
    • The -n argument to xargs tells it how many files to pass to each copy of sh. Using a low number like 1 reduces the chance that you won't be parallelizing well when the number of files left is below the batch size, but increases the performance overhead of starting up new shells.