python snowflake-cloud-data-platform histogram

Creating a histogram using the Snowpark Python API

In Snowflake SQL, there is the WIDTH_BUCKET function which can be used to create a histogram:

with hist as (
  select
    width_bucket(
      x,
      min(x) over (partition by null),
      max(x) over (partition by null),
      10
    ) as hist_bin
  from mydata
)

select hist_bin, count(*) as hist_count
from hist
group by 1
order by 1

It's tedious, but it works.

However, I don't see an equivalent width_bucket function in the Snowpark Python API.

Is there a straightforward equivalent in Snowpark?

Or do I also need to construct the buckets manually with a big ugly case expression?

Solution

You can access any built-in SQL function with snowflake.snowpark.functions.call_function

From the documentation examples:

>>> df = session.create_dataframe([1, 2, 3, 4], schema=["a"])  # a single column with 4 rows
>>> df.select(call_function("avg", col("a"))).show()

"AVG(""A"")"
2.500000

To access a user-defined function, use snowflake.snowpark.functions.call_udf