I have a Redshift and Netezza query where the COUNT(*)
grows whenever I run it, as most tables append rows with more sales, etc. Anyway, I'm looking for the top 20% by price, so I'm using ORDER BY price
, then changing the limit by doing the mental math to get the top 20%, i.e. if COUNT(*)=10,000
, then I'll add a static LIMIT 2000
. Tomorrow, maybe it'll be 12,000 rows though, then I'll need to run the COUNT(*)
and change the LIMIT 2400
again. This is obviously not scalable.
I'm trying to edit the query to include a dynamic limit, looking for the top 20% and easily changing between the percentages: 5%, 10%, etc. So no matter the count of rows, I get consistent results.
Example query below:
SELECT
*
FROM orders
WHERE
sale_date >= '2018-01-01'
ORDER BY
price DESC
LIMIT 2000
I'd like to have something that's more like:
SELECT
*
FROM orders
WHERE
sale_date >= '2018-01-01'
ORDER BY
price DESC
LIMIT (SELECT COUNT(*) * 0.2 FROM orders)
Not sure if Redshift/Netezza supports this functionality. I've seen ONE StackOverflow question where it was possible in MySQL using a variable function, but I'm working in Redshift and Netezza. Anyone have any ideas for how to create this? Thank you!
One method is row_number()
and count(*)
:
SELECT o.*
FROM (SELECT o.*,
ROW_NUMBER() OVER (ORDER BY price DESC) as seqnum,
COUNT(*) OVER () as cnt
FROM orders o
WHERE sale_date >= '2018-01-01'
) o
WHERE seqnum <= cnt * 0.2
ORDER BY price DESC;
You don't have to use two functions. One will do. For instance:
SELECT o.*
FROM (SELECT o.*,
PERCENT_RANK() OVER (ORDER BY price DESC) p
FROM orders o
WHERE sale_date >= '2018-01-01'
) o
WHERE p <= 0.2
ORDER BY price DESC;