We are revamping our existing system, which uses MYSQL DB to deal with the following type of data.
We need to query on these data and pull in statistical data, and also filter, facet and segment list and KPIs.
We tried ClickHouse, Druid, DGraph did a few tests on sample data to benchmark and to check which DB fits our needs.
Few things I liked about Druid DB are,
We found ClickHouse to be faster when compared to MYSQL and Druid databases. But have the following problems.
Your suggestions can help us overcome these challenges and make a better decision.
Thanks in advance.
Unable to do druid-like-search queries (which return dimension and occurrences). Any workaround to achieve this?
That feature sounds to work roughly like:
SELECT interval, dim1, COUNT(*) FROM my_table WHERE condition GROUP BY interval, dim1
UNION ALL
SELECT interval, dim2, COUNT(*) FROM my_table WHERE condition GROUP BY interval, dim2
UNION ALL
...
Case insensitive search. How do we handle this? ClickHouse is case-sensitive, right?
There are multiple options, for example positionCaseInsensitiveUTF8(haystack, needle)
function or match with regular expressions: https://clickhouse.yandex/docs/en/query_language/functions/string_search_functions/#match-haystack-pattern
utf8mb4 support? How do we save/store special characters or few emoji's which are not supported on utf8?
Strings in ClickHouse are arbitrary byte sequences, so you can store whatever you want there, but you should probably check whether the available functions match your usecase.