Search code examples
clickhouse

Multiple small inserts in clickhouse


I have an event table (MergeTree) in clickhouse and want to run a lot of small inserts at the same time. However the server becomes overloaded and unresponsive. Moreover, some of the inserts are lost. There are a lot of records in clickhouse error log:

01:43:01.668 [ 16 ] <Error> events (Merger): Part 201 61109_20161109_240760_266738_51 intersects previous part

Is there a way to optimize such queries? I know I can use bulk insert for some types of events. Basically, running one insert with many records, which clickhouse handles pretty well. However, some of the events, such as clicks or opens could not be handled in this way.

The other question: why clickhouse decides that similar records exist, when they don't? There are similar records at the time of insert, which have the same fields as in index, but other fields are different.

From time to time I also receive the following error:

Caused by: ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, message: Connect to localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out, host: localhost, port: 8123; Connect to ip6-localhost:8123 [ip6-localhost/0:0:0:0:0:0:0:1] timed out
    ... 36 more

Mostly during project build when test against clickhouse database are run.


Solution

  • This is known issue when processing large number of small inserts into (non-replicated) MergeTree.

    This is a bug, we need to investigate and fix.

    For workaround, you should send inserts in larger batches, as recommended: about one batch per second: https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data.