Search code examples
mysqlstatisticsdatabase-administrationclickhouse

Clickhouse columns structure


I store statistics data in MySQL and want to migrate it to the Clickhouse.

For one entity I have multiple statistics types. For examples: clicks, views, upvotes

And now I store it in the table which looks like: entity_id | datetime | type | value, where type is a constant integer which corresponds to one of statistics types described above.

Is it better to keep the same structure in Clickhouse or to create the next schema: entity_id | datetime | clicks | views | upvotes

As I understood, the Clickhouse is column-oriented DB. Does it mean that the Clickhouse has zero-cost altering when we want to add new columns, when it will be necessary?


Solution

  • Is it better to keep the same structure in Clickhouse or to create the next schema

    if clicks | views | upvotes are related to same entity_id, then it makes sense to create a column for each value. By doing this, you will run one query and get a row containing all the data you need.

    On the other hand, if you create table as entity_id | datetime | type | value, then you will have flexibility to have more statistic types(maybe mouse_enter as a 4th type). In the same time, you will have repeated entity_id and datetime values which will cost you disk usage and your table will have 3x rows. As you can see, there is a trade-off between each decision. It's up to you to decide which one to choose. If you think that only clicks | views | upvotes will be enough then it makes much more sense to use this schema.

    Clickhouse is column-oriented DB. Does it mean that the Clickhouse has zero-cost altering when we want to add new columns, when it will be necessary

    Yes, Clickhouse is column-oriented and each column is a separate file(s) on the disk. So there will be little to none overhead if you create a new column. But altering a column will take time if you use table mutations or alter the type, depending on the operation.