I have the following SQL queries that should written similar results wrt offset column, but dont.
Query 1:
SELECT visitor_id, array_agg(timestamp) as time, array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Query 2:
SELECT visitor_id ,array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Here uri_args is simply a json file which under the key 'offset' contains the value of the offset for the particular API response. This is from response log table of a server.
Although the two queries are similar and according to me ought to return the same values in the offset_list column i find the following discrepancy :
I will consider a particular visitor_id to convey it clearly, for a visitor_id ='12345' query i returns the following row in the offset_list col
[0, 0, 0, 10, 0, 10, 20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 57, 77, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41]
and for query 2 the output is as follows :
[20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 77, 57, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41, 0, 0, 0, 10, 0, 10]
I can observe the the two are circular permutations of each other but fail to see why this is happening. Please help me understand what the difference is the inner working of each query. The first reply suits the intent of my quest which is to capture the visitors journey on the homepage.
If you need your array elements deterministically ordered specify order by
clause for aggregate function as mentioned in the docs
Some aggregate functions such as
array_agg()
produce different results depending on the order of input values. This ordering can be specified by writing anORDER BY
Clause within the aggregate function
SELECT visitor_id ,array_agg(offset order by timestamp) as offset_list
-- ...