Search code examples
amazon-s3pathpowerbisubdirectoryamazon-athena

Athena table - file's path


I would like to create a table in athena by using the S3 bucket, but first I would want to know if there is any way to bring the name of the subdirectory (source data of the files) as a new column in this table. If so, which is the syntax I should be using? If this is not possible, I would need at least the path of the file in that new column.

The subdirectory name is needed for a mapping in Power BI.

Thank you in advance!


Solution

  • From Find the Source File for an Athena Table Row:

    • To find out which Amazon S3 file contains data that is returned by a specific row in an Athena table, run a SELECT $path query.
    • To find out which Athena table rows are associated with a specific Amazon S3 file, run a SELECT query with the WHERE $path condition.

    From CREATE EXTERNAL TABLE - Amazon Redshift:

    By default, Amazon Redshift creates external tables with the pseudocolumns $path and $size. Select these columns to view the path to the data files on Amazon S3 and the size of the data files for each row returned by a query. The $path and $size column names must be delimited with double quotation marks. A SELECT * clause doesn't return the pseudocolumns. You must explicitly include the $path and $size column names in your query, as the following example shows:

    select "$path", "$size"
    from spectrum.sales_part
    where saledate = '2008-12-01';