Search code examples
sqlclickhouse

Parse to Array from file text content in ClickHouse


I have text file ('textFile.csv) with CSV format using \t delimiter.

"regalo"    [('regalos',1000),('regalos originales',602),('regalos para hombres',601),('regalos para mujer',600)]
"125"       [('125 aniversario athletic',800),('125cc',601),('125/2',600),('125 moto',561),('125 entre 2',560),('125 entre 3',559),('125/5',558),('125 segunda mano',557)]
"camper"    [('camper segunda mano',601),('camperas',600)]

I would like to be able to transform the second field into a 'map' or array type element that I can use with clickhouse-local to, for example, obtain the term related to each value of field 1

If I try using

clickhouse-local --query="SELECT c1,arrayJoin(c2) FROM file('textFile.csv','TSV')"

Shows me

Exception: ARRAY JOIN requires array argument: While processing c1,arrayJoin(c2) AS cc

Using

clickhouse-local --query="SELECT c1,arrayJoin(groupArray(c2)) as cc,cc.1 FROM   file('textFile.csv','TSV') group by c1"

Shows me

Exception: First argument for function tupleElement must be tuple or array of tuple. Actual String

I'm struggling to break out of the loop. I've also tried using JSONExtract, but I can't transform the second text field into an array or map

My desired result would be

 RelatedText                | searches  | originText    |
| --------------------------| --------- | --------------|
|"regalos"                  | 1000      | "regalo"      |
|"regalos originales"       | 602       | "regalo"      |
|"regalos| para hombres"    | 60        | "regalo"      |
|"regalos para mujer"       | 600       | "regalo"      |
|"125 aniversario athletic" | 800       | "125"         |
|"125 moto"                 | 561       | "125"         |
|"125cc"                    | 601       | "125"         |
|"125/2"                    | 60        | "125"         |
|"camper segunda mano"      | 601       | "camper"      |
|"camperas"                 | 600       | "camper"      |
|"camper donostia"          | 562       | "camper"      |

Thanks!


Solution

  • This worked for me (using your data above):

    SELECT
        c1,
        untuple(arrayJoin(c2))
    FROM file('lino.tsv')
    
    Query id: e635b096-c183-45a2-83c3-44dbfe4bdb8d
    
    ┌─c1───────┬─tupleElement(arrayJoin(c2), 1)─┬─tupleElement(arrayJoin(c2), 2)─┐
    │ "regalo" │ regalos                        │                           1000 │
    │ "regalo" │ regalos originales             │                            602 │
    │ "regalo" │ regalos para hombres           │                            601 │
    │ "regalo" │ regalos para mujer             │                            600 │
    │ "125"    │ 125 aniversario athletic       │                            800 │
    │ "125"    │ 125cc                          │                            601 │
    │ "125"    │ 125/2                          │                            600 │
    │ "125"    │ 125 moto                       │                            561 │
    │ "125"    │ 125 entre 2                    │                            560 │
    │ "125"    │ 125 entre 3                    │                            559 │
    │ "125"    │ 125/5                          │                            558 │
    │ "125"    │ 125 segunda mano               │                            557 │
    │ "camper" │ camper segunda mano            │                            601 │
    │ "camper" │ camperas                       │                            600 │
    └──────────┴────────────────────────────────┴────────────────────────────────┘
    
    14 rows in set. Elapsed: 0.006 sec.