Search code examples
window-functionsclickhouse

argMax uses only one thread and runs slow in clickhouse


I have this query

SELECT argMax(IFNULL(A, 0), B) over (partition by C)
FROM db.table
WHERE B between 0 AND 10000000;

The problems are:

  1. It uses only one thread (though I have 4 shards in a cluster, Distributed source table, many CPU's, max_threads=0)
  2. It works really slow (query takes multiple minutes to finish

How to fix them?

Additional info is:

Column types:

A - int32
B - int32
C - int32

EXPLAIN PIPELINE

(Expression)
ExpressionTransform
  (Window)
  WindowTransform
    (Sorting)
    MergingSortedTransform 19 → 1
      MergeSortingTransform × 19
        LimitsCheckingTransform × 19
          PartialSortingTransform × 19
            (Union)
              (Expression)
              ExpressionTransform × 16
                (ReadFromMergeTree)
                MergeTreeThread × 16 0 → 1
              (ReadFromRemote)

TABLES SETTINGS:

table: ENGINE = Distributed(table_loc, sharding_key=C)

local table_loc: ENGINE = MergeTree ORDER BY B SETTINGS index_granularity = 8192


Solution

  • WINDOW FUNCTION over (partition by C) need to serialize all rows and it's done by a single thread at the server initiator inside Distributed table

    One of the solutions would be to calculate without WINDOW FUNCTION, using many threads and at shards