I have a datalake that gets sent data whenever the source system is updated. This can result in a single item being sent multiple times, for the multiple versions of it.
In the USQL, i can retrieve everything, then partition the data set and get my latest version of each item.
However, it doesn't look like variables are available in Views? I'd like a view for ease of access by other teams. e.g.
CREATE VIEW MyDatabase.DataLakeViews.LastestDataVersion
AS
@output =
EXTRACT MyKey string,
MyData string,
EventEnqueuedUtcTime DateTime
FROM @"adl://bwdatalakestore.azuredatalakestore.net/Stream/MGS/pts/sportsbook/betinfo/csv/2017/11/27/{*}.csv"
USING Extractors.Text(delimiter : '|', skipFirstNRows : 1);
@PartitionedOutput =
SELECT *,
ROW_NUMBER() OVER(PARTITION BY MyKey ORDER BY EventEnqueuedUtcTime DESC) AS RowNumber
FROM @output;
@FinalOutput =
SELECT *
FROM @PartitionedOutput
WHERE RowNumber == 1;
OUTPUT @FinalOutput
TO "/ReferenceGuide/QSE/Extract/SearchLog_extracted.txt"
USING Outputters.Tsv();
This doesn't work in a view. Is there a way to shorthand this partitioning, rather than putting into every query.
Possibly a way to achieve this via a custom extractor? It appears that it works by looping over each row, so maybe not suited here...
Views follow the SQL language of being defined without parameters on a single expression.
You want to parameterize the view, which is a table-valued function.