In Snowflake SQL, there is the WIDTH_BUCKET
function which can be used to create a histogram:
with hist as (
select
width_bucket(
x,
min(x) over (partition by null),
max(x) over (partition by null),
10
) as hist_bin
from mydata
)
select hist_bin, count(*) as hist_count
from hist
group by 1
order by 1
It's tedious, but it works.
However, I don't see an equivalent width_bucket
function in the Snowpark Python API.
Is there a straightforward equivalent in Snowpark?
Or do I also need to construct the buckets manually with a big ugly case
expression?
You can access any built-in SQL function with snowflake.snowpark.functions.call_function
From the documentation examples:
>>> df = session.create_dataframe([1, 2, 3, 4], schema=["a"]) # a single column with 4 rows
>>> df.select(call_function("avg", col("a"))).show()
"AVG(""A"")" |
---|
2.500000 |
To access a user-defined function, use snowflake.snowpark.functions.call_udf