Search code examples
clickhouse

Clickhose error on joining with pointInPoligon


I need to handle data in tables using mat view.

CREATE TABLE test.zone
(
    `id` UInt32,
    `district` String,
    `city` String,
    `polygon` Polygon
)
ENGINE = Memory;

CREATE TABLE test.source1
(
    id UInt32,
    point Point
)
ENGINE = Null;

CREATE TABLE test.by_zone1
(
    `id` UInt32,
    `zone_id` UInt32,
    `count` UInt32
)
ENGINE = AggregatingMergeTree
ORDER BY (zone_id, id);

CREATE MATERIALIZED VIEW test.mv_by_zone
TO test.by_zone1
AS SELECT
    z.id AS zone_id,
    s.id AS id,
    count(s.id) AS count
FROM test.`source1` s 
JOIN test.`zone` z ON pointInPolygon(s.point, z.polygon) 
GROUP BY z.id, s.id;

For materialized view I got this error

SQL Error [403] [07000]: Code: 403. DB::Exception: JOIN  test.source1 AS __table1 
ALL INNER JOIN test.zone AS __table2 ON pointInPolygon(__table1.point, __table2.polygon) 
join expression contains column from left and right table, you may try experimental support 
of this feature by `SET allow_experimental_join_condition = 1`. (INVALID_JOIN_ON_EXPRESSION) 
(version 24.5.1.1065 (official build))

Am I right that I can't use pointInPolygon in this case or there is some mistake in query?


Solution

  • I wouldn't recommend doing joins on a MV - the hash table needs to be built for all insert blocks. This is going to dramatically slow down inserts.

    You should try to use a dictionary or Engine=JOIN table - the latter will ensure the hash table is pre-built in memory. For your usecase you can use a polygon dictionary to hold your zone data (depends on size and example https://clickhouse.com/blog/real-world-data-noaa-climate-data)

    https://clickhouse.com/docs/en/sql-reference/dictionaries#polygon-dictionaries

    Note this join will only be triggered for inserts into source1.

    If you must do what you are doing include 'SETTINGS allow_experimental_join_condition = 1' in your MV query.