Search code examples
pythonsnowflake-cloud-data-platformuser-defined-functions

Using already deployed Snowflake UDF from snowpark


So I have a prebuilt and predeployed set of Python UDFs that I would like to use from within a snowpark program. I defined the UDFs in SQL:

CREATE OR REPLACE FUNCTION dg_utility__field_contains_phone_number(str string, pre_clean boolean)
RETURNS BOOLEAN
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'dg_utility__field_contains_phone_number'
AS
$$
.......udf stuff
$$;

Now I want to leverage those UDFs from snowpark python dsl. Something like:

df = df.withColumn(col, dg_utility__field_contains_phone_number(col))

Is there a way to do this, or am I required to redefine dg_utility__field_contains_phone_number within my snowpark program?

I want to be able to share UDFs between SQL and python. I think I could just use session.sql, but that would require me to generate dynamic SQL against the dataframe I want to apply these functions to and I'm looking to avoid that.


Solution

  • From the documentation for snowpark.functions.call_udf (note that if the UDF already exists, the session.udf.register line is not needed):

    from snowflake.snowpark.types import IntegerType
    from snowflake.snowpark.functions import call_udf
    udf_def = session.udf.register(lambda x, y: x + y, name="add_columns", input_types=[IntegerType(), IntegerType()], return_type=IntegerType(), replace=True)
    df = session.create_dataframe([[1, 2]], schema=["a", "b"])
    df.select(call_udf("add_columns", col("a"), col("b"))).show()
    
    ADD_COLUMNS("A", "B")
    3