Search code examples
sqlleft-joinclickhouse

Clickhouse: "left join" table to external array


Suppose there is a table:

CREATE TABLE details(id String, detail_one UInt64, detail_two UInt64) ENGINE = Memory;
INSERT INTO TABLE details(id, detail_one, detail_two) VALUES ('id1', 5, 10), ('id2', 20, 30);

I want to be able to "left join" it to an external array of input values, so that I won't have to perform additional hash map based lookups after I get the resulting set from the database. This includes solutions that I've tried with a plain WHERE IN and UNION over this input array. Is it a bad idea in general? Can you suggest a better alternative?

Something like this:

SELECT *
FROM ['id1', 'id2', 'id3-missing'] AS input
LEFT JOIN details ON details.id = input;

So that the resulting set is as follows:

{
    ('id2', 5, 10), 
    ('id1', 20, 30),
    ('id3-missing', NULL, NULL) # basically return NULLs
}

Notes:

  1. It is a rather small table: less than 10000 rows (mostly columns) and won't change much in this regard, the input array can be sized similarly.
  2. It has to support extra aggregations that I'm going to perform on top of it.

Solution

  • Consider using arrayJoin function:

    SELECT *
    FROM
    (
        SELECT arrayJoin(['id1', 'id2', 'id3-missing']) AS id
    ) AS input
    LEFT JOIN details ON details.id = input.id
    SETTINGS join_use_nulls = 1
    
    /*
    ┌─id──────────┬─details.id─┬─detail_one─┬─detail_two─┐
    │ id1         │ id1        │          5 │         10 │
    │ id2         │ id2        │         20 │         30 │
    │ id3-missing │ ᴺᵁᴸᴸ       │       ᴺᵁᴸᴸ │       ᴺᵁᴸᴸ │
    └─────────────┴────────────┴────────────┴────────────┘
    */
    

    or

    SELECT *
    FROM details
    RIGHT JOIN
    (
        SELECT arrayJoin(['id1', 'id2', 'id3-missing']) AS id
    ) AS input ON details.id = input.id
    SETTINGS join_use_nulls = 1
    
    /*
    ┌─id──┬─detail_one─┬─detail_two─┬─input.id─┐
    │ id1 │          5 │         10 │ id1      │
    │ id2 │         20 │         30 │ id2      │
    └─────┴────────────┴────────────┴──────────┘
    ┌─id───┬─detail_one─┬─detail_two─┬─input.id────┐
    │ ᴺᵁᴸᴸ │       ᴺᵁᴸᴸ │       ᴺᵁᴸᴸ │ id3-missing │
    └──────┴────────────┴────────────┴─────────────┘
    */
    

    SELECT
        *,
        input.tuple.1 AS array_id,
        input.tuple.2 AS array_value
    FROM
    (
        SELECT arrayJoin([('id1', 100), ('id2', 200), ('id3-missing', 300)]) AS tuple
    ) AS input
    LEFT JOIN details ON details.id = (input.tuple.1)
    
    /*
    ┌─tuple───────────────┬─id──┬─detail_one─┬─detail_two─┬─array_id────┬─array_value─┐
    │ ('id1',100)         │ id1 │          5 │         10 │ id1         │         100 │
    │ ('id2',200)         │ id2 │         20 │         30 │ id2         │         200 │
    │ ('id3-missing',300) │     │          0 │          0 │ id3-missing │         300 │
    └─────────────────────┴─────┴────────────┴────────────┴─────────────┴─────────────┘
    */