Search code examples
sqlapache-sparkhivehql

hql: lead then create table, get wrong lead data


Bad case:

CREATE TABLE IF NOT EXISTS tmp12 AS
   WITH tmp_table AS
(
SELECT dt, time, LEAD(time,1,'99999') OVER(PARTITION BY urs) AS last_time, urs, tag FROM
    (
        SELECT dt, urs, time, tag, count(1) FROM table0 LIMIT 10000
    ) t1
GROUP BY dt, urs, time, tag
ORDER BY urs, time
)

Then

select * from tmp12

I get wrong data like:

WRONG RESULT(dt, time, last_time, urs)

There are some last_time < time.

When I remove CREATE TABLE, I get a good case:

SELECT dt, time, LEAD(time,1,'99999') OVER(PARTITION BY  urs) AS last_time, urs, tag FROM
    (
        SELECT dt, urs, time, tag, count(1) FROM table0 LIMIT 10000
    ) t1
GROUP BY dt, urs, time, tag
ORDER BY urs, time

with correct result like:

GOOD RESULT(dt, time, last_time, urs)

all the last_time > time.

Why? I just create table and then select from it, but the last_time becomes wrong?


Solution

  • Could you try this:

    CREATE TABLE IF NOT EXISTS tmp12 AS
       WITH tmp_table AS
    (
    SELECT dt, time, LEAD(time,1,'99999') OVER(PARTITION BY urs ORDER BY time) AS last_time, urs, tag FROM
        (
            SELECT dt, urs, time, tag, count(1) FROM table0 LIMIT 10000
        ) t1
    GROUP BY dt, urs, time, tag
    ORDER BY urs, time
    )
    

    I am guessing that the missing order by in the window function is the issue.