I have a S3 bucket named test-bucket
with the following structure
- test-bucket
- surveys
- abc.json
- def.json
[...]
Now I create a table in Trino like this
create table hive.qual2.surveys
(
id VARCHAR,
name VARCHAR
)
with (
format = 'JSON',
external_location = 's3://test-bucket/surveys'
);
select
data from this table works fine. However, now I also need to know where the information is coming from. So I need to enter a column with e.g a value abc.json
if the data is coming from this file.
Any clue how I can achieve this?
As described in the docs for the hive connector provided by trino, I could just use the $path
column to see the file where the information is coming from.
to extract the file name without an extension i simply use this:
split(element_at(split("$path", '/'),-1),'.')[1] as survey_id
so the whole query looks like this after i created the table as described above.
select
*,
split(element_at(split("$path", '/'),-1),'.')[1] as survey_id
from hive.qual2.surveys;
including $path
in the create table definition does not make sense, since it already exists as a hidden column.