Search code examples
dataframesnowflake-cloud-data-platformuser-defined-functions

Creating an UDTF using the Snowpark python API


I am looking for an example of creating a Vectorized UDTF and registering it all in Snowpark syntax? Can anybody provide a code snippet or links where I can see something like that?


Solution

  • Here's an example on how to create a Vectorized UDFT and here's and example of registering a UDF (applies to vectorized as well).

    Something simple like this (make sure you have the latest version of Snowpark) can be tried:

    from snowflake.snowpark.functions import pandas_udtf
    
    class multiply_test:
        def __init__(self):
            self.multiplier = 10
        def end_partition(self, df):
            df.columns = ['id', 'col1', 'col2']
            df.col1 = df.col1*self.multiplier
            df.col2 = df.col2*self.multiplier
            yield df
    multiply_udtf = pandas_udtf(
        multiply_test,
        output_schema=PandasDataFrameType([StringType(), IntegerType(), FloatType()], ["id_", "col1_", "col2_"]),
        input_types=[PandasDataFrameType([StringType(), IntegerType(), FloatType()])],
        session=session
    

    Also the links provided by @Sergiu are great:

    Create an UDTF

    Register an UDTF