Search code examples
azure-data-lakeu-sql

USQL Custom Extractor - Latest Version


I have a datalake that gets sent data whenever the source system is updated. This can result in a single item being sent multiple times, for the multiple versions of it.

In the USQL, i can retrieve everything, then partition the data set and get my latest version of each item.

However, it doesn't look like variables are available in Views? I'd like a view for ease of access by other teams. e.g.

CREATE VIEW MyDatabase.DataLakeViews.LastestDataVersion
    AS

@output = 
    EXTRACT MyKey string,
            MyData string,
            EventEnqueuedUtcTime DateTime
    FROM @"adl://bwdatalakestore.azuredatalakestore.net/Stream/MGS/pts/sportsbook/betinfo/csv/2017/11/27/{*}.csv"
    USING Extractors.Text(delimiter : '|', skipFirstNRows : 1);

@PartitionedOutput =
    SELECT *,
           ROW_NUMBER() OVER(PARTITION BY MyKey ORDER BY EventEnqueuedUtcTime DESC) AS RowNumber
    FROM @output;

@FinalOutput = 
    SELECT  *
    FROM    @PartitionedOutput
    WHERE   RowNumber == 1;

OUTPUT @FinalOutput
TO "/ReferenceGuide/QSE/Extract/SearchLog_extracted.txt"
USING Outputters.Tsv();

This doesn't work in a view. Is there a way to shorthand this partitioning, rather than putting into every query.

Possibly a way to achieve this via a custom extractor? It appears that it works by looping over each row, so maybe not suited here...


Solution

  • Views follow the SQL language of being defined without parameters on a single expression.

    You want to parameterize the view, which is a table-valued function.