python amazon-web-services amazon-redshift udf

Aggregate UDFs with Python in Redshift

I managed to write a few scalar functions with Python in AmazonRedshift, i.e. taking one or a few columns as input and returning a single value based on some logic or transformation.

But is there any way to pass all the values of a numeric column(i.e. a list) in a UDF and calculate statistics on those, for example the mean or standard deviation ?

Solution

The documentation states only scalar udf function is possible (see http://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html).

However you may cheat if the value list is not too huge by creating a string scalar udf expecting a string list, result of LISTAGG function execution.

eg: select udfSum(listagg(val,'|')) from table;

see: http://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html