Search code examples
sqlt-sqlazure-synapseexternal-tables

Retrieve filename while using external table


I'm trying to retrieve filename while doing a select from an external table using Synapse Analytics.

I didn't find any way but I was thinking that in the REJECT_ROW_LOCATION ( when some rows are rejected) the file created contains the name of the file that is rejecting some rows.

Have You any ideas on how to proceed? Thanks in advance


Solution

  • Polybase sadly does not include the option to include the filename which I know can make debugging large file loads a bit tricky. You have a couple of options

    • single file - if you are using a single filepath, the filename will always be available in the LOCATION part of the CREATE EXTERNAL STATEMENT command. If you are using a wildcard, then at least the path will be available but not the full filename
    • add the filename into the file on extraction - a bit of a hack, but if you add the filename to the source file on extraction, then it will be present for Polybase to pick it up. This could be a good solution if you have good control over the extract process
    • COPY INTO allows some customisation of the column list and allows you to add defaults. Not sure if will allow an entire dummy column though, need to check
    • Synapse Pipeline $$FILEPATH syntax - use this syntax in a Copy activity to add the filename. See the link for a good example of that.
    • Spark input_file_name - we actually use Synapse Notebooks to load a lot of our data and adding input_file_name to the dataframe gives the filename. It's a different approach as it's not a virtualisation layer like Polybase but does the trick for us.