Search code examples
mysqlsqlmariadbmariadb-10.2

5th percentile on MySQL (MariaDB)


I'm trying to find the 95th percentile (and the highest buy) of the item-price using the order in my ~300k row table.

I've been successful in finding the 95th percentile and the highest buy for one single item with this code:

 SELECT type_id,
       Max(price) AS buy,
       Min(price) AS '95th% buy'
FROM   (SELECT *,
               ( Row_number()
                   OVER (
                     partition BY type_id
                     ORDER BY price DESC) ) AS rownr
        FROM   orderbuffertest AS rownr
        WHERE  is_buy_order = 1
        ORDER  BY ( Row_number()
                      OVER (
                        partition BY type_id
                        ORDER BY price DESC) ) ASC) AS t1
WHERE  t1.type_id = 44992
       AND t1.rownr < (SELECT Count(*)
                       FROM   orderbuffertest
                       WHERE  is_buy_order = 1
                              AND type_id = 44992) * 0.05;  

However, now I'm trying to GROUP BY type_id and it's messing up all my values.

Does anybody have an idea of how to GROUP BY type_id this query? Maybe even ways to improve the original one?

I thank you in advance,

TheJozzle

Ps. Here's a link to my database, if you'd like to mess/test around with it: https://gofile.io/?c=Ga6ODr


Solution

  • This query should give you the results you want. It allocates a ROW_NUMBER by price as well as counting all rows for each type_id and order type (is_buy_order) in a CTE, then selects the MAX price as the buy price (for is_buy_order = 1), and the minumum price for rows >= the 95th percentile as the 95th percentile price. In the event that there are no rows in the 95th percentile other than the highest price, the second highest price is returned. Similar logic applies to the generation of the sell and 95th%sell prices:

    WITH prices AS (
      SELECT type_id, price, is_buy_order,
             ROW_NUMBER() OVER (PARTITION BY type_id, is_buy_order ORDER BY price DESC) AS rownr,
             COUNT(*) OVER (PARTITION BY type_id, is_buy_order) AS num_rows
      FROM   orderbuffertest
    )
    SELECT type_id,
           MAX(CASE WHEN is_buy_order = 1 THEN price END) AS buy,
           COALESCE(MIN(CASE WHEN is_buy_order = 1 AND 100.0 * (rownr - 1) / num_rows <= 5 AND rownr != 1 THEN price END), 
                    MAX(CASE WHEN is_buy_order = 1 AND rownr = 2 THEN price END)) AS `95th%buy`,
           MIN(CASE WHEN is_buy_order = 0 THEN price END) AS sell,
           COALESCE(MAX(CASE WHEN is_buy_order = 0 AND 100.0 * rownr / num_rows >= 95 AND rownr != num_rows THEN price END), 
                    MAX(CASE WHEN is_buy_order = 0 AND rownr = num_rows - 1 THEN price END)) AS `95th%sell`
    FROM prices
    GROUP BY type_id
    

    If you can't use CTEs for some reason, you could write the CTE as a subquery:

    SELECT type_id,
           MAX(CASE WHEN is_buy_order = 1 THEN price END) AS buy,
           COALESCE(MIN(CASE WHEN is_buy_order = 1 AND 100.0 * (rownr - 1) / num_rows <= 5 AND rownr != 1 THEN price END), 
                    MAX(CASE WHEN is_buy_order = 1 AND rownr = 2 THEN price END)) AS `95th%buy`,
           MIN(CASE WHEN is_buy_order = 0 THEN price END) AS sell,
           COALESCE(MAX(CASE WHEN is_buy_order = 0 AND 100.0 * rownr / num_rows >= 95 AND rownr != num_rows THEN price END), 
                    MAX(CASE WHEN is_buy_order = 0 AND rownr = num_rows - 1 THEN price END)) AS `95th%sell`
    FROM (
      SELECT type_id, price, is_buy_order,
             ROW_NUMBER() OVER (PARTITION BY type_id, is_buy_order ORDER BY price DESC) AS rownr,
             COUNT(*) OVER (PARTITION BY type_id, is_buy_order) AS num_rows
      FROM   orderbuffertest
    ) prices
    GROUP BY type_id
    

    Demo on dbfiddle