Search code examples
sqlpostgresqlquery-optimizationpostgresql-performance

Calculate ratio of counts in PostgreSQL


I have a table mydata in a Postgres 14 database, with the following relevant columns:

  • ftype: an enum having foo, bar and baz
  • status: another enum having pending, failed, success

I want the success rate of different types. Success rate is basically: the number of rows where the status is success divided by total number of rows for that ftype.

Currently, I'm doing the following:

SELECT
    COALESCE(
        COUNT(CASE WHEN ftype = 'foo' AND status = 'success' THEN 1 END) / 
        NULLIF(COUNT(CASE WHEN ftype = 'foo' THEN 1 END)::real, 0)
    ,0)::real AS foo_rate,

    COALESCE(
        COUNT(CASE WHEN ftype = 'bar' AND status = 'success' THEN 1 END) / 
        NULLIF(COUNT(CASE WHEN ftype = 'bar' THEN 1 END)::real, 0)
    ,0)::real AS bar_rate,

    COALESCE(
        COUNT(CASE WHEN ftype = 'baz' AND status = 'success' THEN 1 END) / 
        NULLIF(COUNT(CASE WHEN ftype = 'baz' THEN 1 END)::real, 0)
    ,0)::real AS baz_rate,
FROM mydata;

Is there a better/more performant way? How can I optimize it?
Would using PARTITION in the query help?


Solution

  • Since you are computing rates for all types, a pivoted result would be simpler:

    SELECT ftype, CASE WHEN ct_success = 0 THEN 0 ELSE ct_success/ct_total END AS rate
    FROM  (
       SELECT ftype
            , count(*) AS ct_total
            , count(*) FILTER (WHERE status = 'success')::real AS ct_success
       FROM   mydata
       GROUP  BY 1
       ) sub;
    

    Works for a selection of types, too. (The benefit is smaller, though.) Filter early:

    SELECT ftype, CASE WHEN ct_success = 0 THEN 0 ELSE ct_success/ct_total END AS rate
    FROM  (
       SELECT ftype
            , count(*) AS ct_total
            , count(*) FILTER (WHERE status = 'success')::real AS ct_success
       FROM   mydata
       WHERE  ftype = ANY ('{foo,bar,baz}'::my_enum_type[])  -- here!
       GROUP  BY 1
       ) sub;
    

    About the WHERE clause:

    You may want to round() and/or use numeric to begin with. See:

    About the aggregate FILTER clause:

    Whether table partitioning might help is hard to tell from the minimal information we have. There are useful guidelines in the manual chapter "Overview" for "Table Partitioning". My educated guess is that it wouldn't benefit you.