Search code examples
azure-blob-storagedata-warehouseazure-data-factorydelete-rowskip

Delete last record from flat file in blob to Azure data warehouse


I have some pipe delimited flat files in Blob storage and in each file I have a header and footer record with filename, date of extract and the number of records. I am using ADF pipeline with Polybase toload into Azure DWH. I could skip header record but unable to skip the footer. The only way I could think of is creating staging table with all varchar and load into staging and then convert the data types back into main tables. But that is not working as the number of columns is different to the footer and the data. Is there any easier way to do this? Please advise.


Solution

  • Polybase does not have an explicit option for removing footer rows but it does have a set of rejection options which you could potentially take advantage of. If you set your REJECT_TYPE as VALUE (rather than PERCENTAGE) and your REJECT_VALUE AS 1 you are telling Polybase to reject one row only. If your footer is in a different format to the main data rows, it will be rejected but your query should not fail.

    CREATE EXTERNAL TABLE yourTable
    ...    
    <reject_options> ::=  
    {  
        | REJECT_TYPE = value,  
        | REJECT_VALUE = 2 
    

    Please post a simple, anonmyised example of your file with headers, columns and footers if you need further assistance.

    Update: Check this blog post for information on tracking rejected rows:

    https://azure.microsoft.com/en-us/blog/load-confidently-with-sql-data-warehouse-polybase-rejected-row-location/