I'm trying to convert a scaleR script into T-SQL (with external_script execution) for in-database R analytics in SQL Server. The scaleR script that I'm working with is on the DeepDive Data Science Tutorial on Fraud data by MSDN.
All my data is in SQL Server now (from the tutorial), and all I want to do is query this table to get summary using the rxSummary
scaleR function (in-database).
This is my attempt:
exec sp_execute_external_script
@language = N'R',
@script = N'
sumOut <- rxSummary(
formula = ~gender + balance + numTrans + numIntlTrans + creditLine,
data = ccFraud
)
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudSmall]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'summary'
with result sets ((summary varchar(max) not null));
But this throws an error:
STDOUT message(s) from external script: Rows Read: 10000, Total Rows Processed: 10000, Total Chunk Time: Less than .001 seconds Computation time: 0.000 seconds. Msg 11536, Level 16, State 1, Line 5 EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.
Any idea how to deal with this?
Am I missing a step?
In your example, there is a mismatch between the R script and SQL parameter definitions. The output_data_1_name
parameter value specifies the name of the R dataframe that is being returned from R to SQL Server. But there is no value assigned to summary
in the R script. Hence you are getting a 0 results error. See sp_execute_external_script document for description and example.
If you are just trying to display the summary as messages, then you can add print(sumOut)
to the R script.
If you want to return data to SQL Server, you would need to build an R dataframe and assign to the output_1_data
.