Search code examples
sqlmysqlwindow-functions

COUNT(*) OVER(ORDER BY...) - the result of this window function is counterintuitive


So I've been melting my brain for the whole evening and still haven't come up with a sophisticated answer. I hope you can help!

So I have this table, made of unique IDs and registration time (200 inquiries):

USER ID REGISTRATION_TIME
1 '2021-01-01'
2 '2020-01-01'
3 '2019-01-01'
4 '2018-01-01'

...down to 200 IDs

This window function call

SELECT user_id, count(*) OVER(ORDER BY registration_time desc) as cnt, registration_time 
FROM user;

...returns the following (the middle attribute goes down to 200 rows):

USER ID REGISTRATION_TIME REGISTRATION_TIME
1 1 '2021-01-01'
2 2 '2020-01-01'
3 3 '2019-01-01'
4 5 '2018-01-01'
5 5 '2018-01-01'
6 6 '2017-01-01'

So, I see that COUNT(*) numbers every row (descending, according to registration_time, as per instruction) and increments it by 1 for each recurring inquiry.

How is that? What I know about COUNT, is that it calculates the number of rows in a given table. What I know about window functions - particularly the one without PARTITION BY - is that they gather all rows in one group.

What COUNT(*) does here, really resembles of what SUM(1) OVER(ORDER BY ... rows unbounded preceding) does (correct me if I'm wrong)

So, what I expect, is 1 for every row... But instead, I see an arithmetic sequence...

I wonder how exactly does it work? Thank you very much in advance!

P.S.. I tried Googling but didn't find something relevant...


Solution

  • From the MySQL 8.0 Window Function Frame Specification

    In the absence of a frame clause [stackoverflow edit: The "frame clause" being the UNBOUNDED PRECEDING that you noted], the default frame depends on whether an ORDER BY clause is present:

    With ORDER BY: The default frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY clause). The default is equivalent to this frame specification:

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    

    Which describes the results you are seeing