Search code examples
sqlsql-order-bylimitamazon-redshiftnetezza

Using a dynamic LIMIT in a SQL Query Redshift


I have a Redshift and Netezza query where the COUNT(*) grows whenever I run it, as most tables append rows with more sales, etc. Anyway, I'm looking for the top 20% by price, so I'm using ORDER BY price, then changing the limit by doing the mental math to get the top 20%, i.e. if COUNT(*)=10,000, then I'll add a static LIMIT 2000. Tomorrow, maybe it'll be 12,000 rows though, then I'll need to run the COUNT(*) and change the LIMIT 2400 again. This is obviously not scalable.

I'm trying to edit the query to include a dynamic limit, looking for the top 20% and easily changing between the percentages: 5%, 10%, etc. So no matter the count of rows, I get consistent results.

Example query below:

SELECT
 * 
FROM orders
WHERE
 sale_date >= '2018-01-01'
ORDER BY 
 price DESC
LIMIT 2000

I'd like to have something that's more like:

SELECT
 * 
FROM orders
WHERE
 sale_date >= '2018-01-01'
ORDER BY 
 price DESC
LIMIT (SELECT COUNT(*) * 0.2 FROM orders)

Not sure if Redshift/Netezza supports this functionality. I've seen ONE StackOverflow question where it was possible in MySQL using a variable function, but I'm working in Redshift and Netezza. Anyone have any ideas for how to create this? Thank you!


Solution

  • One method is row_number() and count(*):

    SELECT o.* 
    FROM (SELECT o.*,
                 ROW_NUMBER() OVER (ORDER BY price DESC) as seqnum,
                 COUNT(*) OVER () as cnt
          FROM orders o
          WHERE sale_date >= '2018-01-01'
         ) o
    WHERE seqnum <= cnt * 0.2
    ORDER BY price DESC;
    

    You don't have to use two functions. One will do. For instance:

    SELECT o.* 
    FROM (SELECT o.*,
                 PERCENT_RANK() OVER (ORDER BY price DESC) p
          FROM orders o
          WHERE sale_date >= '2018-01-01'
         ) o
    WHERE p <= 0.2
    ORDER BY price DESC;