I'm building an event-based system where event properties (defined as columns) can be one of 3 data types (string, boolean, Float32). I'm trying to decide between creating a separate column for each type, or each column is a tuple.
For example...
one column per type:
CREATE TABLE event
(
event_name String,
timestamp DateTime DEFAULT now(),
-- Property with 3 types
prop1_str String,
prop1_num Float32,
prop1_bool Boolean
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)
one column per property with 3-type tuple:
CREATE TABLE event
(
event_name String,
timestamp DateTime DEFAULT now(),
-- Property with 3 types
prop1 Tuple(
str String,
num Float32,
bool Boolean
)
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)
The one column per property feels more organized, but my gut suspects that this might be less performant for querying.
Old versions (20...) read all tuple columns even if you query only one element of a Tuple. This issue is resolved.
There is no difference for modern versions of Clickhouse. The same storage and the same compute is in both cases.
3 columns
insert into event select '', now(), toString(number), number, number%2 from numbers(1e8);
0 rows in set. Elapsed: 9.311 sec. Processed 100.65 million rows, 805.21 MB (10.81 million rows/s., 86.48 MB/s.)
select count() from event where prop1_num > 42;
1 row in set. Elapsed: 0.116 sec. Processed 100.00 million rows, 400.00 MB (862.20 million rows/s., 3.45 GB/s.)
select count() from event where prop1_str like '%42%';
1 row in set. Elapsed: 1.135 sec. Processed 100.00 million rows, 1.69 GB (88.08 million rows/s., 1.49 GB/s.)
tuple
insert into event select '', now(), tuple(toString(number), number, number%2) from numbers(1e8);
0 rows in set. Elapsed: 8.807 sec. Processed 100.65 million rows, 805.21 MB (11.43 million rows/s., 91.43 MB/s.)
select count() from event where prop1.num > 42;
1 row in set. Elapsed: 0.117 sec. Processed 100.00 million rows, 400.00 MB (854.28 million rows/s., 3.42 GB/s.)
select count() from event where prop1.str like '%42%';
1 row in set. Elapsed: 1.129 sec. Processed 100.00 million rows, 1.69 GB (88.59 million rows/s., 1.50 GB/s.)