i'm trying to visualise a series of data on a chart, ideally creating something that would show how the distribution lies. The data is in a snowflake db, and I've attempted to do this using percentiles. I've calculated the value at 10 % increments. My current query is as below, but this doesn't seem to give me a great output to visualise
select
APPROX_PERCENTILE(field_name, 0.1) "10_percent_value",
APPROX_PERCENTILE(field_name, 0.2) "20_percent_value",
APPROX_PERCENTILE(field_name, 0.3) "30_percent_value",
etc.
from table.name
I'm sure i'm going about this the wrong way, but would like some help on alternative methods that could work please to be able to support a curve type visualisation. Hopefully this is enough information?
Too bad Snowflake's percentile function requires 2nd argument to be a literal constant; I was hoping for a way to pass a column name to it to make this solution a bit more programmatic.
In terms of a getting a visualizable output, you need all percentile values to be in the same column. Here is what that would look like
create or replace temporary table percentiles (percentile_value int) as
select approx_percentile(field_name,0.1) from t
union all
select approx_percentile(field_name,0.2) from t
union all
select approx_percentile(field_name,0.3) from t
union all
select approx_percentile(field_name,0.4) from t
union all
select approx_percentile(field_name,0.5) from t
union all
select approx_percentile(field_name,0.6) from t
union all
select approx_percentile(field_name,0.7) from t
union all
select approx_percentile(field_name,0.8) from t
union all
select approx_percentile(field_name,0.9) from t
union all
select approx_percentile(field_name,1.0) from t;
select *, row_number() over (order by percentile_value)||'0th' as percentile
from percentiles;
If performance becomes an issue, you can use approx_percentile_accumulate() to store percentile state in a table and run calculations on that instead of the base table
Sample Output: