Search code examples
pythonamazon-web-servicesamazon-redshiftudf

Aggregate UDFs with Python in Redshift


I managed to write a few scalar functions with Python in AmazonRedshift, i.e. taking one or a few columns as input and returning a single value based on some logic or transformation.

But is there any way to pass all the values of a numeric column(i.e. a list) in a UDF and calculate statistics on those, for example the mean or standard deviation ?


Solution

  • The documentation states only scalar udf function is possible (see http://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html).

    However you may cheat if the value list is not too huge by creating a string scalar udf expecting a string list, result of LISTAGG function execution.

    eg: select udfSum(listagg(val,'|')) from table;

    see: http://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html