Search code examples
u-sql

U-SQL Ignore Empty Files


I receive a daily dump of files from a data provider. On occasion we receive empty files (20bytes). Is there any way to automatically avoid processing or skip these files?

I have tried:

USING Extractors.Csv(skipFirstNRows:1, silent:true);

But I seem to get a vertex failure related to what I believe is the empty files.


Solution

  • We recently added a FILE.LENGTH property as a computed virtual column that you can use to filter out files of a certain size.

    For example the following should only operate on the files that are larger than 20 bytes:

    @data = 
      EXTRACT 
              // ... columns to extract
            , file_sz = FILE.LENGTH()
      FROM "/mydata/{*}"
      USING Extractors.Csv();
    
    @res =
      SELECT *
      FROM @data
      WHERE file_sz > 20;