I have a datatable in Impala having a complex column with a structure: key and value. I want to show in each row the entire fields of that column: keys and values for each row, when I SELECT the data.
Desired table:
Id, map_tag
100, {building:yes, type:apartment, street:street_a, number:3}
101, {building:yes, type:hotel}
102, {building:yes, type: bank}
If I use:
SELECT id, GROUP_CONCAT(kv_pair) map_tag
FROM (
SELECT dt.id id, CONCAT(cc.key,':',cc.value) kv_pair
FROM datatable dt, datatable.complex_column cc
) T
GROUP BY id
Then I get one only Id, instead of many Ids. Here a screenshot of it:
According to Impala Complex Types doc, "references to fields within MAP columns use the KEY and VALUE pseudocolumns". This doc also provides the syntax to query MAP fields. So a combination of a SELECT
and GROUP_CONCAT
should do it:
SELECT id, GROUP_CONCAT(kv_pair) map_tag
FROM (
SELECT dt.id id, CONCAT(cc.key,':',cc.value) kv_pair
FROM datatable dt, dt.complex_column cc
) T
GROUP BY id
Query submitted at: 2020-08-24 12:34:17 (Coordinator: https://impalac:25000)
+-----+---------------------------------------------------------+
| id | map_tag |
+-----+---------------------------------------------------------+
| 102 | building:yes, type:bank |
| 100 | building:yes, type:apartment, street:street_a, number:3 |
| 101 | building:yes, type:hotel |
+-----+---------------------------------------------------------+
Fetched 3 row(s) in 0.42s
Please note that when referencing a map
field while doing join, you should use an alias for the "main" table if you did assign it an alias. I.e. in the above example, the inner SELECT
uses datatable dt
so the reference to complex_column
should look like dt.complex_column
(not actual_table.column_name which would be "normal").