Search code examples
sqlclickhouse

Matching Array Elements in a Table and Returning Results in a New Field using clickhouse


I have a table with two fields, the first one, field "path", is a value and the second one, field "arrKW", is an array of strings.

+---------------------+----------------+
| path                | arrKW          |
+---------------------+----------------+
| folder/puntonet     | ['kw1','kw2'] |
| folder/puntonet-2.0 | ['kw2','kw3']  |
| folder/puntonet-4   | ['kw2','kw4'] |
| folder/puntonet-5   | ['kw5','kw4']  |
+---------------------+----------------+

I would like to obtain, for each field "path", an array of elements where there is a match with any element from field "arrKW",but not including the same value as in field "path".

+---------------------+---------------------+---------------------------------------------+
| path                | arrKW               | Result Field                                |
+---------------------+---------------------+---------------------------------------------+
| folder/puntonet     | ['kw1','kw2']      | ['folder/puntonet-2.0','folder/puntonet-4'] |
| folder/puntonet-2.0 | ['kw2','kw3','kw5'] | ['folder/puntonet','folder/puntonet-5']     |
| folder/puntonet-4   | ['kw2','kw4']      | ['folder/puntonet','folder/puntonet-2.0']   |
| folder/puntonet-5   | ['kw5','kw4']       | ['folder/puntonet-2.0','folder/puntonet-4'] |
+---------------------+---------------------+---------------------------------------------+

Solution

  • The output in the question does not match to the description.

    create table test (path String, arrKW Array(String))Engine=Memory as
    select * from values (('folder/puntonet',['kw1','kw2']),
    ('folder/puntonet-2.0',['kw2','kw3']),
    ('folder/puntonet-4',['kw2','kw4']),
    ('folder/puntonet-5',['kw5','kw4']));
    
    SELECT
        (arrayJoin(a) AS t).2 AS path,
        t.1 AS KW,
        arrayFilter(i -> (i != path), groupArrayArray(patha)) AS result
    FROM
    (
        SELECT
            groupArray(path) AS patha,
            groupArray((arrKW, path)) AS a,
            arrayJoin(arrKW) AS KW
        FROM test
        GROUP BY KW
    )
    GROUP BY t
    ORDER BY path ASC
    
    ┌─path────────────────┬─KW────────────┬─result────────────────────────────────────────────────────────┐
    │ folder/puntonet     │ ['kw1','kw2'] │ ['folder/puntonet-2.0','folder/puntonet-4']                   │
    │ folder/puntonet-2.0 │ ['kw2','kw3'] │ ['folder/puntonet','folder/puntonet-4']                       │
    │ folder/puntonet-4   │ ['kw2','kw4'] │ ['folder/puntonet','folder/puntonet-2.0','folder/puntonet-5'] │
    │ folder/puntonet-5   │ ['kw5','kw4'] │ ['folder/puntonet-4']                                         │
    └─────────────────────┴───────────────┴───────────────────────────────────────────────────────────────┘