Search code examples
sqlgroup-bysubquerydata-warehousefact-table

SQL filtering counts in a fact table using a subquery or better alternative


I am trying to summaries data from my Data Warehouse into a fact table and I want to be able to count up the number of transactions per type that there were per customer per month.

Columns being used are: Customer_id, Transaction_id, transaction_date, Transaction_type

Ideally what I want to get is.

Customer Month transaction_type_1 transaction_type_2 Total_transactions
12345 1 18 8 26
12345 2 23 14 37
67891 1 14 22 36

I have to put it into a subquery but I get the total number of type 1 transactions for all customers in each month. I have tried unsuccessfully to using partition on top of that but now very far outside my level.

Select 
 customer_id, 
 month, 
 count(transactions_id),
 (select count(transactions_id) from DWH where transaction_type = 1),
 (select count(transactions_id) from DWH where transaction_type = 2)
FROM DWH
GROUP BY customer_id, month

Incorrect table output looks something like this.

Customer Month transaction_type_1 transaction_type_2 Total_transactions
12345 1 432 564 26
12345 2 456 765 37

In a standalone table I can get the information, but I can not incorporate it into the fact table view.

Standalone this works to get individual counts of each type, but I haven't been able to rework that into a select subquery:

select customer_id, month, count(*) 
   FROM DWH 
   WHERE dwh.transaction_type = 1
   Group BY dwh.customer_id, month;

Any help would be much appreciated.


Solution

  • You may be getting incorrect results because the filters in the where clause of your respective subqueries do not consider the group by columns i.e.

    The table alias d helps us to distinguish between columns used in the outer/general query and the subqueries.

    select 
     d.customer_id, 
     d.month, 
     count(d.transactions_id),
     (
         select count(transactions_id) 
         from DWH 
         where transaction_type = 1 and 
               customer_id = d.customer_id and
               month = d.month
     ) as transaction_type_1,
     (
         select count(transactions_id) 
         from DWH 
         where transaction_type = 2 and 
               customer_id = d.customer_id and
               month = d.month
     ) as transaction_type_2
    FROM DWH d
    GROUP BY d.customer_id, d.month
    

    However, while this approach may work, it would be best if you tested this performance wise on your respective database and evaluate the cost metrics/query plan.

    Another approach that may be performant uses case expressions to achieve the result and has been included below:

    SELECT
         customer_id as Customer,
         month,
         COUNT(
             CASE WHEN transaction_type=1 THEN transactions_id END
         ) as transaction_type_1,
         COUNT(
             CASE WHEN transaction_type=2 THEN transactions_id END
         ) as transaction_type_2,
         COUNT(1) as Total_transactions
    FROM
        DWH
    GROUP BY
        customer_id, month