Search code examples
u-sql

Guid.NewGuid() always return same Guid for all rows


I need unique guid for every row i'm transforming from source.
below is sample script; code Guid.NewGuid() returns same always for all rows

@Person =
    EXTRACT SourceId          int,
            AreaCode          string,
            AreaDetail         string,
            City        string
    FROM "/Staging/Person"
    USING Extractors.Tsv(nullEscape:"#NULL#");

@rs1 =
    SELECT 
    Guid.NewGuid() AS PersonId,
    AreaCode,
    AreaDetail,
    City    
    FROM @Person;

OUTPUT @rs1   
    TO "/Datamart/DimUser.tsv"
      USING Outputters.Tsv(quoting:false, dateTimeFormat:null);

Solution

  • A quick summary of the issue is that you shouldn't attempt to assign unique values through techniques that rely on generating new Guids or on any other methods with are "time-based". The reason for this is that, rows in U-SQL may be recalculated - to due vertex retries, performance optimizations, etc.

    In those cases, the values will be reassigning a new value and eventually lead to an error while running a U-SQL script - because U-SQL requires that rows are deterministic with respect to input data.

    Instead of as assigning a new Guid, use the ROW_NUMBER Window Function which is can safely add new unique numbers to rows. I

    @result =
        SELECT 
            *,
            ROW_NUMBER() OVER () AS UID
        FROM @querylog;