Search code examples
sqlamazon-athenaprestotrino

Create json object from several columns programmatically in presto/trino/aws athena


I have a source table which have 300 columns and the number of columns might grow so I'm trying to construct a query where I want to insert into target table only a few columns and the other data must be merged in json object in a specific column.

For example source table looks like this:

column_a | column_b | column_c | column_d | several_other_columns....
---------------------------------------------------------------------
value_a  | value_b    | value_c    | value_d    | several_other_values......

And the target table should look like this:

column_a  | all_other_columns_combined_in_a_json
-------------------------------------------------
value_a   | {column_b: value_b, column_c: value_c, column_d: value_d, ........}

I can pick columns from information_schema like this:

select column_name
from information_schema.columns
where table_schema = 'source_db'
and table_name = 'source_table'
and column_name not in ('column_a')

but I do not understand how to pass those values into json_object() function

How can I achieve this if possible?


Solution

  • AFAIK Presto/Trino does not support dynamic SQL generation and execution, so currently the only option is to use "outside" scripting. When facing similar task some time ago I've ended up with AWS Lambda which fetched list of columns from information_schema.columns, generated needed SQL and executed corresponding query.

    Possibly something could be done via AWS user defined function but have not tried those.

    Just to verify - has started discussion @github.

    UPD

    It was confirmed by devs:

    currently there is no such functionality and I am not aware of any plans to build one