Search code examples
sqloracle-databaseplsqloracle19c

What should go in a COUNT (*) OVER window function?


I'm looking at an Oracle stored procedure which as the following (paraphrased) sql

select AB.*, count(*) over (PARTITION BY 
    TO_CHAR(AB.CAN_DATE,'MONTH'),
    TO_CHAR(AB.CAN_DATE,'MM'),
    'XMEG'
    ORDER BY
TO_CHAR(AB.CAN_DATE,'MONTH'),
    TO_CHAR(AB.CAN_DATE,'MM'),
    'XMEG') from mytable AB;

Apparently this code works OK, but I have several questions

  1. surely the to_char (...MM) is redundant, as the to_char(....Month) will do the same job?
  2. what is the value of an order by clause in a COUNT(*) OVER? Does it have any effect? Why is it allowed.
  3. what does it mean to include a constant ('XMEG') in the partition / order by. I'm guessing it will have no effect?

Solution

  • Your query will count the values for each month regardless of year. So all the values in October will be counted together regardless of whether it was October 2023, October 1900 or October 9999. If you want the count for individual months then you should use TRUNC(ab.can-date, 'MM') or TO_CHAR(ab.can_date, 'YYYY-MM') rather than TO_CHAR with just the month.

    1. surely the to_char (...MM) is redundant, as the to_char(....Month) will do the same job?
    • TO_CHAR(AB.CAN_DATE,'MONTH') finds the name of the month
    • TO_CHAR(AB.CAN_DATE,'MM') finds the number of the month

    Since both are unique to the month then they will have the same effect in the PARTITION BY clause and, yes, the second one is redundant.

    1. what is the value of an order by clause in a COUNT(*) OVER? Does it have any effect? Why is it allowed.

    If you wanted a cumulative (running) total starting from the earliest date to the last date then you could use an ORDER BY clause. If you do not have an ORDER BY clause then the count will be the total over the entire partition.

    For example, if you have the sample data:

    CREATE TABLE table_name (dt) AS
      SELECT DATE '1900-10-01' FROM DUAL UNION ALL
      SELECT DATE '1970-10-31' FROM DUAL UNION ALL
      SELECT DATE '2023-10-01' FROM DUAL UNION ALL
      SELECT DATE '2023-10-05' FROM DUAL UNION ALL
      SELECT DATE '2023-10-13' FROM DUAL UNION ALL
      SELECT DATE '2023-10-13' FROM DUAL UNION ALL
      SELECT DATE '2023-10-28' FROM DUAL UNION ALL
      SELECT DATE '9999-10-31' FROM DUAL UNION ALL
      SELECT DATE '2023-12-31' FROM DUAL;
    

    Then:

    SELECT dt,
           TO_CHAR(dt, 'MONTH') AS month_name,
           TO_CHAR(dt, 'MM') AS month_number,
           COUNT(dt) OVER (PARTITION BY TO_CHAR(dt, 'MONTH')) AS total_by_name,
           COUNT(dt) OVER (PARTITION BY TO_CHAR(dt, 'MM')) AS total_by_number,
           COUNT(dt) OVER (PARTITION BY TRUNC(dt, 'MM')) AS total_by_month,
           COUNT(dt) OVER (PARTITION BY TO_CHAR(dt, 'MONTH') ORDER BY dt)
             AS running_total_by_name
    FROM   table_name
    

    Outputs:

    DT MONTH_NAME MONTH_NUMBER TOTAL_BY_NAME TOTAL_BY_NUMBER TOTAL_BY_MONTH RUNNING_TOTAL_BY_NAME
    2023-12-31 00:00:00 DECEMBER 12 1 1 1 1
    1900-10-01 00:00:00 OCTOBER 10 8 8 1 1
    1970-10-31 00:00:00 OCTOBER 10 8 8 1 2
    2023-10-01 00:00:00 OCTOBER 10 8 8 5 3
    2023-10-05 00:00:00 OCTOBER 10 8 8 5 4
    2023-10-13 00:00:00 OCTOBER 10 8 8 5 6
    2023-10-13 00:00:00 OCTOBER 10 8 8 5 6
    2023-10-28 00:00:00 OCTOBER 10 8 8 5 7
    9999-10-31 00:00:00 OCTOBER 10 8 8 1 8

    fiddle

    And you can see that the final column has a running total (where values with the same ORDER BY are counted together).

    In your case, the PARTITION BY and ORDER BY clauses are identical so every value in a partition will have the same value and will be ordered identically so the ORDER BY clause, in your case, is irrelevant and could be removed.

    1. what does it mean to include a constant ('XMEG') in the partition / order by. I'm guessing it will have no effect?

    In this case, it has no effect.

    You can do COUNT(*) OVER () if you want to count over the entire result set without any partitions. If you wanted to include a PARTITION BY clause and still get the same effect then you could use COUNT(*) OVER (PARTITION BY 'a constant') but that is about the only reason I can think to include a constant (and even then you could just omit the PARTITION BY clause).