Search code examples
u-sql

U-SQl declaring @variables as null


I have a master table which I am ingesting data to from a couple of source tables which I am not joining.

One of the tables doesn't have one of the columns so I want to insert Nulls into this column with an 'insert into' statement. To do this, I am trying to create a @missing_column variable.

DECLARE @missing_column string IS NULL;  // This does not work.

How would you recommend doing this? I know I can write the words "NULL" as a string and insert it as that. But that would not work for a column if it decimals.


Solution

  • U-SQL uses the C# type system, so you need to define types and assign values using the C# syntax. This means you will need to use a nullable type and assign the correctly typed null values:

    Because string is a nullable type already, you can simply use:

    DECLARE @missing_column string = (string)null;
    

    Though if you were trying to use a non-nullable type such as int, you would need to specify the nullable version in your variable and table column:

    DECLARE @missing_column int? = (int?)null;
    

    The official documentation on U-SQL DECLARE statements can be found here.


    Having re-read your question, for your specific example you should just be able to ignore the missing column in your INSERT statement. This will assign the values from the source table where assigned and leave the additional, unspecified target column null. To to this, you will obviously need to have a nullable type in the table column per what I have written above:

    Master table definition:

    CREATE TABLE dbo.MasterTarget
    (
      Col1 Int,
      Col2 string,    // string is already nullable
      Col3 DateTime?, // DateTime is not, so need to specify DateTime? that accepts null values
      INDEX clx_Col1 CLUSTERED(Col1 ASC)
    )
    DISTRIBUTED BY HASH(Col1);
    

    All column data load:

    @src1 =
        EXTRACT Col1 Int,
                Col2 string,
                Col3 DateTime
        FROM "/SourceFile1.csv"
        USING Extractors.Csv();
    
    INSERT INTO dbo.MasterTarget
    (
      Col1,
      Col2,
      Col3
    )
    SELECT Col1,
           Col2,
           Col3
    FROM @src1;
    

    Missing column data load:

    @src2 =
        EXTRACT Col1 Int,
                Col2 string
        FROM "/SourceFile2.csv"
        USING Extractors.Csv();
    
    INSERT INTO dbo.MasterTarget
    (
      Col1,
      Col2
    )
    SELECT Col1,
           Col2
    FROM @src2;