Search code examples
azure-data-lakeu-sql

u-sql fails to handle null


I am trying out azure datalake, and tried to test u-sql on my test file.

this is my test script

@input =
    EXTRACT TagName     string,
            time        DateTime,
            valuee      float,
            Quality     int
    FROM "/MCVD/{*}/{*}/{*}/{*}/{*}/{*}/{*}.csv"
    USING Extractors.Csv(skipFirstNRows:1, nullEscape:"");

OUTPUT @input
    TO "/output/Test-query.csv"
    USING Outputters.Csv();

When running in datalake analytics, i get an error with the column valuee at line 56.

This is the content of line 56

"M32_Amp_FurnaceAluTemperatureTubeReference",2018-02-08T14:48:53.5780000,"",1

I understand that this is because the column holding the value is empty (its the first empty line in the file). I tried to use various NullEscape paterns, but it continues to fail.

I tried to extract the column as a string, which didn't produce any errors.

this is the error:

Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][0] with error: Vertex user code error.
 VertexFailedFast: Vertex failed with a fail-fast error
 E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row after processing 56 record(s) in the vertex' input split. Column index: 2, column name: 'valuee'.
 E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_EMPTY_ERROR: Failure when attempting to convert empty column data.

Solution

  • U-SQL supports nullable data-types. Simply mark them with a question mark after the datatype, eg

    @input =
        EXTRACT TagName     string,
                time        DateTime,
                valuee      float?,
                Quality     int
        FROM [your input path]
        USING Extractors.Csv(skipFirstNRows:1, silent:true);
    

    You can also suppress error rows with the silent switch of the extractor set to true. I am however getting some different errors with your script and sample data. Is that an accurate repro?