Query to get total distinct values side by side with group by values

How do I get a column that shows a count of total distinct values side by side with a "group by" count of distinct values.

For example, in the query below I'm getting counts of distinct patients in each value set. I would also like to get the total count of distinct patients as well. How do I do this? The query below works but does not get me the counts for the total number of patients. This happens to be in Databricks SQL/Apache Spark but I imagine has a generalized solution across most sql database implementations.

select
  value_set_name,
  oid value_set_oid,
  count(distinct rx.patient_id) as count
  -- ??? count of total distinct patient_id's ???
from
  rx
  join value_set vs on rx.code = vs.code
group by 1,2
order by 1

Solution

A subquery is possibly the simplest method:

select value_set_name,
       oid as value_set_oid,
       count(distinct rx.patient_id) as count,
       (select count(distinct rx2.patient_id) from rx rx2) as num_total_distinct
  -- ??? count of total distinct patient_id's ???
from rx join
     value_set vs
     on rx.code = vs.code
group by 1,2
order by 1;

Because you are using count(distinct), you cannot simply sum the total count(distinct)s from all rows -- patients in more than on group would be counted multiple times.

Your query is actually pretty simple. It might be faster to do two levels of aggregation:

select value_set_name, value_set_oid, count(*) as num_patients,
       sum(count(*)) over () as num_total_patients
from (select value_set_name, oid as value_set_oid, rx_patient_id, count(*) as cnt
      from rx join
           value_set vs
           on rx.code = vs.code
      group by 1, 2, 3
     ) rv
group by 1, 2
order by 1;