I store statistics data in MySQL and want to migrate it to the Clickhouse.
For one entity I have multiple statistics types. For examples: clicks
, views
, upvotes
And now I store it in the table which looks like: entity_id | datetime | type | value
, where type
is a constant integer which corresponds to one of statistics types described above.
Is it better to keep the same structure in Clickhouse or to create the next schema: entity_id | datetime | clicks | views | upvotes
As I understood, the Clickhouse is column-oriented DB. Does it mean that the Clickhouse has zero-cost altering when we want to add new columns, when it will be necessary?
Is it better to keep the same structure in Clickhouse or to create the next schema
if clicks | views | upvotes
are related to same entity_id
, then it makes sense to create a column for each value. By doing this, you will run one query and get a row containing all the data you need.
On the other hand, if you create table as entity_id | datetime | type | value
, then you will have flexibility to have more statistic types(maybe mouse_enter
as a 4th type). In the same time, you will have repeated entity_id
and datetime
values which will cost you disk usage and your table will have 3x rows. As you can see, there is a trade-off between each decision. It's up to you to decide which one to choose. If you think that only clicks | views | upvotes
will be enough then it makes much more sense to use this schema.
Clickhouse is column-oriented DB. Does it mean that the Clickhouse has zero-cost altering when we want to add new columns, when it will be necessary
Yes, Clickhouse is column-oriented and each column is a separate file(s) on the disk. So there will be little to none overhead if you create a new column. But altering a column will take time if you use table mutations or alter the type, depending on the operation.