I have the following code:
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
However, it seems like this results in different results every time I run it.
What makes this difference?
Thanks in advance!
(I've checked that the code outputs different results through the following code:
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
select count (*) from
(
select * from t1
union
select * from t2
) as t;
The resulting row count is different from t1's row count and t2's row count; meaning that the result of t1 and t2 is different.)
First, there is no need to repeat the partition by
columns in the order by
. You can simplify this to:
lead(session_end_type) over (partition by user_id, session_id order by log_time) as next_session_end_type
Second, if log_time
is not unique for a given user_id
/session_id
, then the results are unstable. Remember, SQL tables represent unordered sets, so if there are ties in sort keys then there is no "natural" order to fall back on.
You can check this wtih:
select user_id, session_id, log_time, count(*)
from table_name
group by user_id, session_id, log_time
having count(*) > 1
order by count(*) desc;
If you do have a column that uniquely identifies each row (or each user/user session row), then include that in the order by
:
lead(session_end_type) over (partition by user_id, session_id
order by log_time, <make it stable column>) as next_session_end_type
)