Search code examples
performancequery-optimizationclickhouse

Performance of tuple-based columns


I'm building an event-based system where event properties (defined as columns) can be one of 3 data types (string, boolean, Float32). I'm trying to decide between creating a separate column for each type, or each column is a tuple.

For example...

one column per type:

CREATE TABLE event
(
  event_name String,
  timestamp DateTime DEFAULT now(),
  
  -- Property with 3 types
  prop1_str String,
  prop1_num Float32,
  prop1_bool Boolean
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)

one column per property with 3-type tuple:

CREATE TABLE event
(
  event_name String,
  timestamp DateTime DEFAULT now(),
  
  -- Property with 3 types
  prop1 Tuple(
    str String, 
    num Float32, 
    bool Boolean
  )
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)

The one column per property feels more organized, but my gut suspects that this might be less performant for querying.


Solution

  • Old versions (20...) read all tuple columns even if you query only one element of a Tuple. This issue is resolved.

    There is no difference for modern versions of Clickhouse. The same storage and the same compute is in both cases.

    3 columns

    insert into event select '', now(), toString(number), number, number%2 from numbers(1e8);
    0 rows in set. Elapsed: 9.311 sec. Processed 100.65 million rows, 805.21 MB (10.81 million rows/s., 86.48 MB/s.)
    
    select count() from event where prop1_num > 42;
    1 row in set. Elapsed: 0.116 sec. Processed 100.00 million rows, 400.00 MB (862.20 million rows/s., 3.45 GB/s.)
    
    select count() from event where prop1_str like '%42%';
    1 row in set. Elapsed: 1.135 sec. Processed 100.00 million rows, 1.69 GB (88.08 million rows/s., 1.49 GB/s.)
    

    tuple

    insert into event select '', now(), tuple(toString(number), number, number%2) from numbers(1e8);
    0 rows in set. Elapsed: 8.807 sec. Processed 100.65 million rows, 805.21 MB (11.43 million rows/s., 91.43 MB/s.)
    
    select count() from event where prop1.num > 42;
    1 row in set. Elapsed: 0.117 sec. Processed 100.00 million rows, 400.00 MB (854.28 million rows/s., 3.42 GB/s.)
    
    select count() from event where prop1.str like '%42%';
    1 row in set. Elapsed: 1.129 sec. Processed 100.00 million rows, 1.69 GB (88.59 million rows/s., 1.50 GB/s.)