Search code examples
apache-sparkapache-spark-sqldatabricksuser-defined-functionsdatabricks-sql

In Databricks Workbook using Spark SQL, how to pass parameters thru SQL UDF functions?


Previously posted in Databricks Community: https://community.databricks.com/t5/community-discussions/can-we-pass-parameters-thru-sql-udf-s/td-p/50752

Is it possible to pass parameters from a SQL UDF thru to another SQL UDF that is called by the first SQL UDF?

Below is an example where I would like to call tbl_filter() from tbl_func() by passing the tbl_func.a_val parameter to tbl_filter().

Obviously, I could just implement the logic in one function, but in my use case that means repeating the tbl_filter() part of the query over and over in different versions of tbl_func(), which defeats the purpose of packaging code in functions. Ideally, there would be a single version of tbl_filter() to maintain that could be called from a variety of different versions of tbl_func() with whatever parameters I pass to tabl_func().

If not, what is the work around for something like this?

CREATE OR REPLACE TEMPORARY VIEW test_tbl AS
WITH a AS
(SELECT explode(sequence(1, 10)) AS a),
b AS
(SELECT explode(sequence(50, 60)) AS b)
SELECT * FROM a CROSS JOIN b;

CREATE OR REPLACE TEMPORARY FUNCTION tbl_filter(a_val INT)
    RETURNS TABLE(a INT, b INT)
    RETURN
    SELECT * FROM test_tbl tf
    WHERE tf.a = tbl_filter.a_val;

CREATE OR REPLACE TEMPORARY FUNCTION tbl_func(a_val INT, b_val INT)
    RETURNS TABLE(a INT, b INT)
    RETURN
    SELECT * FROM tbl_filter(tbl_func.a_val) tf
    WHERE tf.b = tbl_func.b_val;

-- This executes
-- select * from tbl_filter(1);

-- This does not: Error in SQL statement: AnalysisException: could not resolve `tbl_filter` to a table-valued function.
select * from tbl_func(1, 60);

Solution

  • When using temporary functions, the function is only available for the current session. Apparently, the inner SQL code in the tbl_func function uses a different session and cannot resolve the name.

    The simplest solution is to use permanent functions and view:

    CREATE OR REPLACE VIEW mydb.test_tbl AS
    WITH a AS
    (SELECT explode(sequence(1, 10)) AS a),
    b AS
    (SELECT explode(sequence(50, 60)) AS b)
    SELECT * FROM a CROSS JOIN b;
    
    CREATE OR REPLACE FUNCTION mydb.tbl_filter(a_val INT)
        RETURNS TABLE(a INT, b INT)
        RETURN
        SELECT * FROM mydb.test_tbl tf
        WHERE tf.a = tbl_filter.a_val;
    
    CREATE OR REPLACE FUNCTION mydb.tbl_func(a_val INT, b_val INT)
        RETURNS TABLE(a INT, b INT)
        RETURN
        SELECT * FROM mydb.tbl_filter(tbl_func.a_val) tf
        WHERE tf.b = tbl_func.b_val;
    
    -- This executes
    select * from mydb.tbl_filter(1);
    
    -- This also executes
    select * from mydb.tbl_func(1, 60);
    

    query_results