Search code examples
oracletimepivot

Pivot purchases by hour


Apologies, this is similar to a question a recently asked and was answered.

I have a query below that is tracking customer purchases by the hour in order to determine when a customer is likely to place an order. The query appears to be working fine but I'm stuck with a few issues and I'm looking for help. First, instead of output of NULLS I want to display 0 (zero) for those hours a purchase wasn't made. I think COALESCE() is the right way to proceed but ran into syntax errors. Secondly, I want to display the customers FIRST_NAME, LAST_NAME after the customer_id and the total purchases after hour 23 for each customer_id. I am thinking LEFT JOIN as I want to display customers that have no purchases too. In my test CASE below that would be customer_id 2. Below is my test CASE and sample data. As always, if there is a better or simpler way to code this I would appreciate any input.

ALTER SESSION SET NLS_TIMESTAMP_FORMAT = 'DD-MON-YYYY  HH24:MI:SS.FF';

CREATE TABLE customers (CUSTOMER_ID, FIRST_NAME, LAST_NAME) AS
SELECT 1, 'Faith', 'Aaron' FROM DUAL UNION ALL
SELECT 2, 'Lisa', 'Jones' FROM DUAL UNION ALL
SELECT 3, 'Roz', 'Doyle' FROM DUAL;

create table purchases(
      ORDER_ID NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
      customer_id   number,
      PRODUCT_ID NUMBER,
      QUANTITY NUMBER,
      purchase_date timestamp
    );

insert  into purchases (customer_id, product_id, quantity, purchase_date)
    select  1 customer_id, 102 product_id, 1 quantity,
   TIMESTAMP '2024-04-03 00:00:00' + INTERVAL '23:27' HOUR TO MINUTE + ((LEVEL-1) * INTERVAL '1 00:00:01' DAY TO SECOND)  * -1 +   ((LEVEL-1) * interval '0.007125' second) 
           as purchase_date
    from    dual
    connect by level <= 3 UNION all
select  1, 101, 1,
   TIMESTAMP '2024-05-10 00:00:57' + INTERVAL '07:17' HOUR TO MINUTE + ((LEVEL-1) * INTERVAL '1 00:00:01' DAY TO SECOND)  * -1 +   ((LEVEL-1) * interval '0.000120' second) 
    from    dual
    connect by level <= 2 UNION all
select  1, 101, 1,
   TIMESTAMP '2024-06-13 00:00:59.999999' + INTERVAL '23:14' HOUR TO MINUTE + ((LEVEL-1) * INTERVAL '1 00:00:00' DAY TO SECOND)  * -1 +   ((LEVEL-1) * interval '0.999999' second) 
    from    dual
    connect by level <= 1 UNION all
select  3, 100, 1,
   TIMESTAMP '2024-06-16 00:00:00.888999' + INTERVAL '00:37' HOUR TO MINUTE + ((LEVEL-1) * INTERVAL '1 00:00:00' DAY TO SECOND)  * -1 +   ((LEVEL-1) * interval '0.999999' second) 
    from    dual
    connect by level <= 1 UNION all
select  3, 103, 3,
   TIMESTAMP '2024-06-09 00:00:00' + INTERVAL '17:37' HOUR TO MINUTE + ((LEVEL-1) * INTERVAL '1 00:00:00' DAY TO SECOND)  * -1 +   ((LEVEL-1) * interval '0.009120' second) 
    from    dual
    connect by level <= 6;

SELECT *
 FROM
  (   
     SELECT   
      customer_id, 
SUBSTR(TO_CHAR(PURCHASE_DATE ,'HH24'),1,2) tm
    FROM purchases)
PIVOT
(   
  SUM (1)   
  FOR TM IN ('00','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23'
   )
) pv
ORDER BY customer_id;


Solution

  • First, instead of output of NULLS I want to display 0 (zero) for those hours a purchase wasn't made. I think COALESCE() is the right way to proceed but ran into syntax errors.

    Use COUNT instead of SUM as COUNT displays 0 when there are no matching rows but SUM will output NULL. Although if you are OUTER JOINing the customers table and you want to show those rows as 0 rather than NULL then you will have to use COALESCE (and, in that case, you could leave it as SUM).

    You are probably getting syntax errors as you have not given the pivoted columns aliases so you are getting column names like '00' and to reference them you would need to use quoted identifiers "'00'". It is easier/nicer to alias the columns so that you do not have to use quoted identifiers.

    Secondly, I want to display the customers FIRST_NAME, LAST_NAME after the customer_id

    OUTER JOIN the customers table (LEFT or RIGHT depending on whether it is before or after the PIVOT, respectively).

    and the total purchases after hour 23 for each customer_id.

    Add the hourly totals:

    SELECT c.*,
           COALESCE(h0, 0) AS h0,
           COALESCE(h1, 0) AS h1,
           COALESCE(h2, 0) AS h2,
           COALESCE(h3, 0) AS h3,
           COALESCE(h4, 0) AS h4,
           COALESCE(h5, 0) AS h5,
           COALESCE(h6, 0) AS h6,
           COALESCE(h7, 0) AS h7,
           COALESCE(h8, 0) AS h8,
           COALESCE(h9, 0) AS h9,
           COALESCE(h10, 0) AS h10,
           COALESCE(h11, 0) AS h11,
           COALESCE(h12, 0) AS h12,
           COALESCE(h13, 0) AS h13,
           COALESCE(h14, 0) AS h14,
           COALESCE(h15, 0) AS h15,
           COALESCE(h16, 0) AS h16,
           COALESCE(h17, 0) AS h17,
           COALESCE(h18, 0) AS h18,
           COALESCE(h19, 0) AS h19,
           COALESCE(h20, 0) AS h20,
           COALESCE(h21, 0) AS h21,
           COALESCE(h22, 0) AS h22,
           COALESCE(h23, 0) AS h23,
           COALESCE(h0, 0)
           + COALESCE(h1, 0)
           + COALESCE(h2, 0)
           + COALESCE(h3, 0)
           + COALESCE(h4, 0)
           + COALESCE(h5, 0)
           + COALESCE(h6, 0)
           + COALESCE(h7, 0)
           + COALESCE(h8, 0)
           + COALESCE(h9, 0)
           + COALESCE(h10, 0)
           + COALESCE(h11, 0)
           + COALESCE(h12, 0)
           + COALESCE(h13, 0)
           + COALESCE(h14, 0)
           + COALESCE(h15, 0)
           + COALESCE(h16, 0)
           + COALESCE(h17, 0)
           + COALESCE(h18, 0)
           + COALESCE(h19, 0)
           + COALESCE(h20, 0)
           + COALESCE(h21, 0)
           + COALESCE(h22, 0)
           + COALESCE(h23, 0) AS total
    FROM   ( SELECT customer_id, 
                    EXTRACT(HOUR FROM PURCHASE_DATE) AS hour
             FROM   purchases
           )
           PIVOT (   
             COUNT(1)   
             FOR hour IN (
                0 AS H0,   1 AS H1,   2 AS H2,   3 AS H3,   4 AS H4,   5 AS H5,
                6 AS H6,   7 AS H7,   8 AS H8,   9 AS H9,  10 AS H10, 11 AS H11,
               12 AS H12, 13 AS H13, 14 AS H14, 15 AS H15, 16 AS H16, 17 AS H17,
               18 AS H18, 19 AS H19, 20 AS H20, 21 AS H21, 22 AS H22, 23 AS H23
            )
          ) pv
          RIGHT OUTER JOIN customers c
          ON c.customer_id = pv.customer_id
    ORDER BY c.customer_id;
    

    Alternatively, rather than using PIVOT, you could use conditional aggregation:

    SELECT c.customer_id,
           MAX(c.first_name) AS first_name,
           MAX(c.last_name) AS last_name,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 0 THEN 1 END) AS H0,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 1 THEN 1 END) AS H1,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 2 THEN 1 END) AS H2,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 3 THEN 1 END) AS H3,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 4 THEN 1 END) AS H4,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 5 THEN 1 END) AS H5,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 6 THEN 1 END) AS H6,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 7 THEN 1 END) AS H7,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 8 THEN 1 END) AS H8,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 9 THEN 1 END) AS H9,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 10 THEN 1 END) AS H10,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 11 THEN 1 END) AS H11,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 12 THEN 1 END) AS H12,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 13 THEN 1 END) AS H13,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 14 THEN 1 END) AS H14,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 15 THEN 1 END) AS H15,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 16 THEN 1 END) AS H16,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 17 THEN 1 END) AS H17,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 18 THEN 1 END) AS H18,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 19 THEN 1 END) AS H19,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 20 THEN 1 END) AS H20,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 21 THEN 1 END) AS H21,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 22 THEN 1 END) AS H22,
           COUNT(CASE EXTRACT(HOUR FROM p.purchase_date) WHEN 23 THEN 1 END) AS H23,
           COUNT(p.purchase_date) AS total
    FROM   customers c
           LEFT OUTER JOIN purchases p
           ON c.customer_id = p.customer_id
    GROUP BY c.customer_id
    ORDER BY c.customer_id;
    

    Which, for the sample data, both output:

    CUSTOMER_ID FIRST_NAME LAST_NAME H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 TOTAL
    1 Faith Aaron 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 6
    2 Lisa Jones 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    3 Roz Doyle 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 7

    fiddle