Search code examples
razure-data-lakeu-sql

Execute R inside U-SQL


I'm trying to use U-SQL and R to forecast, so i need to pass from U-SQL to R a list of values, and return forecast from R to U-SQL

All examples i found uses a reducer, so will process 1 row only.

https://learn.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-r-extensions

Is it possible to instead of send to R a list of columns, send a list of rows to process?

Thanks!


Solution

  • By definition the User-defined reducers take n rows and produce one or more rows, use it to produce new column data but also new rows. The R extensions for U-SQL include a built-in reducer (Extension.R.Reducer) that runs R code on each vertex assigned to the reducer. You can get the input rowset with the special R parameter of "inputFromUSQL" and work on it with R.

    Like you referenced this should work on all rows at once:

    DECLARE @myRScript = @"
    inputFromUSQL$mydata = as.factor(inputFromUSQL$mydata)
    <..>
    ";
    
    @myData = <my u-sql query>
    
    @RScriptOutput = REDUCE @myData <..>
    USING new Extension.R.Reducer(command:@myRScript, rReturnType:"dataframe")