Search code examples
searchcase-insensitivedruidclickhouse

Problems using ClickHouse


We are revamping our existing system, which uses MYSQL DB to deal with the following type of data.

  • transaction and order related data
  • customers information
  • products information

We need to query on these data and pull in statistical data, and also filter, facet and segment list and KPIs.

We tried ClickHouse, Druid, DGraph did a few tests on sample data to benchmark and to check which DB fits our needs.

Few things I liked about Druid DB are,

We found ClickHouse to be faster when compared to MYSQL and Druid databases. But have the following problems.

  • Unable to do druid-like-search queries (which return dimension and occurrences). Any workaround to achieve this?
  • Case insensitive search. How do we handle this? ClickHouse is case-sensitive, right?
  • utf8mb4 support? How do we save/store special characters or few emoji's which are not supported on utf8?
    We had similar issues in MYSQL, and changing the collation to utf8mb4 solved it. What do we in ClickHouse to achieve this?

Your suggestions can help us overcome these challenges and make a better decision.

Thanks in advance.


Solution

  • Unable to do druid-like-search queries (which return dimension and occurrences). Any workaround to achieve this?

    That feature sounds to work roughly like:

    SELECT interval, dim1, COUNT(*) FROM my_table WHERE condition GROUP BY interval, dim1
    UNION ALL
    SELECT interval, dim2, COUNT(*) FROM my_table WHERE condition GROUP BY interval, dim2
    UNION ALL
    ...
    

    Case insensitive search. How do we handle this? ClickHouse is case-sensitive, right?

    There are multiple options, for example positionCaseInsensitiveUTF8(haystack, needle) function or match with regular expressions: https://clickhouse.yandex/docs/en/query_language/functions/string_search_functions/#match-haystack-pattern

    utf8mb4 support? How do we save/store special characters or few emoji's which are not supported on utf8?

    Strings in ClickHouse are arbitrary byte sequences, so you can store whatever you want there, but you should probably check whether the available functions match your usecase.