Search code examples
sqlapache-sparkapache-spark-sqlwindow-functions

Sql Query in finding max value by comparing values


I have a problem in finding the max value by comparing the values.

id date record
1 2022-01-01 100
2. 2022-01-02 10
3. 2022-01-03 5
4. 2022-01-04 20
5 2022-01-06 22
6 2022-01-10 30

and so on 1000+records

column "date" should compare first value with second value and get the max "record", and then second date with third date and should return the max value and so on

so the final result should return

100,10,20,22,30...

please help with sql query


Solution

  • I'm expressing the below in SQL - as it would work in any relational SQL database - I'm not sure if Spark's LEAD() function needs the OVER() clause or not, standard SQL does.

    WITH
    -- your input, names changed to avoid reserved words
    indata(id,dt,rec) AS (
              SELECT 1,DATE '2022-01-01',100
    UNION ALL SELECT 2,DATE '2022-01-02',10
    UNION ALL SELECT 3,DATE '2022-01-03',5
    UNION ALL SELECT 4,DATE '2022-01-04',20
    UNION ALL SELECT 5,DATE '2022-01-06',22
    UNION ALL SELECT 6,DATE '2022-01-10',30
    )
    -- end of input, real query starts here ...
    SELECT
      *
    , LEAD(rec) OVER(ORDER BY dt) as nextrec
    , GREATEST(LEAD(rec) OVER(ORDER BY dt),rec) AS biggerofpair
    FROM indata;
    
    id dt rec nextrec biggerofpair
    1 2022-01-01 100 10 100
    2 2022-01-02 10 5 10
    3 2022-01-03 5 20 20
    4 2022-01-04 20 22 22
    5 2022-01-06 22 30 30
    6 2022-01-10 30 (null) (null)