Search code examples
sqlpostgresqlwindow-functions

ROW_NUMBER(), PARTITION_BY, TOP 2 MAX If MAX 1st and Last Position


I have a query that uses ROW_NUMBER() and PARTITION BY and orders based on a date col. What I would like to know is if it is possible to restrict this and return only col1's that match if the second max is in position 1 and current max is in position 250.

SELECT t1.col1, t1.col_date, t1.col_val, t1.rno FROM (
  SELECT col1, col_date, col_val, count(*), ROW_NUMBER() OVER (PARTITION BY col1 order by col_date ASC) as rno
FROM table_one
INNER JOIN
  table_two
ON
  table_one.id = table_one_id
WHERE col1 in (
  SELECT
    col1
  FROM
    table_one
  INNER JOIN table_two on table_one.id = table_one_id
  GROUP
    by table_one.id
  HAVING
    COUNT(*) >= 250
  ) 
  GROUP BY col1, col_date, col_val
) t1
WHERE t1.rno < 250
GROUP BY t1.col1, t1.col_date, t1.col_val, t1.rno
ORDER BY t1.col1, t1.col_date;

IE-

 col1 |           col_date           | col_val | rno
--------+----------------------------+--------+-----
 ABC   | 2018-07-18 15:27:35.394051  |   999   |   1
...
 ABC   | 2019-03-24 15:27:34.78493   |   1000  | 250

 XYZ   | 2018-07-18 15:27:35.394051  |   900   |   1
...
 XYZ   | 2019-03-24 15:27:34.78493   |   1001  | 250

Would both be returned if col_val is second max at rno 1 and col_val is max at rno 250.

[UPDATE]

For clarity: Two Tables. table_one is a list of stock symbols. table_two is a list of historic prices. Current query selects the symbols from table_one and joins the historic prices from table_two that have 250+ records for each symbol and the cut off by rno is 250 so the output will look like this.

symbol |     market_close_date      | close  | rno
--------+----------------------------+--------+-----
 FAKE   | 2018-07-18 15:27:35.394051 |  250.0 |   1
 FAKE   | 2018-07-19 15:27:35.391866 |  249.0 |   2
 FAKE   | 2018-07-20 15:27:35.389615 |  248.0 |   3
 FAKE   | 2018-07-21 15:27:35.38741  |  247.0 |   4
 FAKE   | 2018-07-22 15:27:35.3852   |  246.0 |   5
 FAKE   | 2018-07-23 15:27:35.383099 |  245.0 |   6
 FAKE   | 2018-07-24 15:27:35.380934 |  244.0 |   7
 FAKE   | 2018-07-25 15:27:35.378828 |  243.0 |   8
 FAKE   | 2018-07-26 15:27:35.376769 |  242.0 |   9
...
 FAKE   | 2019-03-24 15:27:34.78493  | 1000.0 | 250
 TEST   | 2018-07-18 15:27:35.396232 |  250.0 |   1
 ...
 TEST   | 2018-07-18 15:27:35.64352  |  50.0 | 250

If rno 1 > 2...249 and rno 1...249 < rno 250, this would be valid for FAKE because rno 1 is the greatest close up to 249 and 250 is the greatest close, TEST would not meet the criteria.


Solution

  • I think you can make use of the FILTER and PARTITION BY keywords here. This won't be a copy and paste solution but I think this will point you in the right direction. I used 10 as a row limit because I didn't want to come up with fake data but a 250 can be applied there.

    The table test in this example would be a result of your initial join between the two tables. Selecting all data (or desired fields) and adding ROW_NUMBER

    Idea:

    1. Inner most query: Select desired data and apply ROW_NUMBER
    2. Middle query: Apply the row limit to the initial results (where the row number is <= x). Select the max of everything under row limit x and the max of the entire set including the row limit x
    3. Outermost query: Ensure that row number 1 is equal to max (highest close where rno < x) and ensure that row number x is equal to the max in the set (highest close where rno = x)

    SELECT *
    FROM
    (
        SELECT  *
                , MAX(close) FILTER(WHERE rno < 10) OVER(PARTITION BY symbol) as "max"
                , MAX(close) OVER(PARTITION BY symbol) AS second_max
        FROM 
        (
            SELECT *, ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY close_date) AS rno
            FROM test
        ) t
        WHERE t.rno <= 10
    ) t2
    WHERE (t2.rno = 1 AND t2.close = t2.max)
       OR (t2.rno = 10 AND t2.close = t2.second_max)
    ;
    

    Here is a SQLFiddle showing this idea.