Search code examples
sqlamazon-s3hivecreate-tabletrino

Include filename and path in external location when creating a table in Trino


I have a S3 bucket named test-bucket with the following structure

- test-bucket 
  - surveys
    - abc.json
    - def.json
    [...]

Now I create a table in Trino like this

create table hive.qual2.surveys 
(
    id VARCHAR,
    name VARCHAR
)
with (
    format = 'JSON',
    external_location = 's3://test-bucket/surveys'
);

select data from this table works fine. However, now I also need to know where the information is coming from. So I need to enter a column with e.g a value abc.json if the data is coming from this file.

Any clue how I can achieve this?


Solution

  • As described in the docs for the hive connector provided by trino, I could just use the $path column to see the file where the information is coming from.

    to extract the file name without an extension i simply use this:

    split(element_at(split("$path", '/'),-1),'.')[1] as survey_id

    so the whole query looks like this after i created the table as described above.

    select
      *,
      split(element_at(split("$path", '/'),-1),'.')[1] as survey_id
    from hive.qual2.surveys;
    

    including $path in the create table definition does not make sense, since it already exists as a hidden column.