Search code examples
mysqldatejoinmariadbquery-performance

Working with a MariaDB view that generates a lot of calculated statistics - How to move to a calculated table?


I currently have a MariaDB database that gets populated every day with different products (around 800) and also gets the price updates for these products.

I've created a view on top of the prices/products table that generates statistics such as the avg, mean and mode for the last 7, 15 and 30 days, and calculates the difference from today's price to the averages of 7, 15 and 30 days.

The problem is that whenever I run this view it takes almost 50 seconds to generate the data. I saw some comments about switching over to a calculated table, in which the calculations would be updated when new data is entered into the table, however I'm quite skeptical in doing that, as I'm inserting around 1000 price points at one specific time of the day that will impact all the calculations on the table. Is a calculated table something that updates only the rows that were updated, or it would recalculate everything? I'm worried about the overhead this might cause (memory is not an issue with the server).

I've pasted the products and prices tables and the view to DBFiddle, here: https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=4cf594a85f950bed34f64d800601baa9

Calculations can be seen for product code 22141

Just to give an idea these are some of the calculations done by the view (available on the fiddle as well):

        ROUND((((SELECT preconormal
        FROM precos
        WHERE codigowine = vinhos.codigowine
            AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
        FROM precos
        WHERE codigowine = vinhos.codigowine
            AND timestamp >= CURRENT_DATE - INTERVAL 7 DAY) - 1) * 100), 2) as dif_7_dias,
        ROUND((((SELECT preconormal
        FROM precos
        WHERE codigowine = vinhos.codigowine
            AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
        FROM precos
        WHERE codigowine = vinhos.codigowine
            AND timestamp >= CURRENT_DATE - INTERVAL 15 DAY) - 1) * 100), 2) as dif_15_dias,
        ROUND((((SELECT preconormal
        FROM precos
        WHERE codigowine = vinhos.codigowine
            AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
        FROM precos
        WHERE codigowine = vinhos.codigowine
            AND timestamp >= CURRENT_DATE - INTERVAL 30 DAY) - 1) * 100), 2) as dif_30_dias

If switching to a calculated table, is there an optimal way to do this?


Solution

  • Looking at your query: Try refactoring it to eliminate as many dependent subqueries as possible, and instead JOINing to subqueries. Eliminating those dependent subqueries will make a vast performance difference.

    Figuring the mode is an application of finding the detail record for an extreme value in a dataset. If you use this as a subquery

        WITH freq AS (
                SELECT COUNT(*) freq,
                       ROUND(preconormal, 2) preconormal,
                       codigowine
                  FROM precos
                  WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
                  GROUP BY  ROUND(preconormal, 2), codigowine
            ),
            most AS (
               SELECT MAX(freq) freq,
                      codigowine
                 FROM freq
                GROUP BY codigowine
           ),
           mode AS (
             SELECT GROUP_CONCAT(preconormal ORDER BY preconormal DESC) modeps,
                    freq.codigowine
               FROM freq
               JOIN most ON freq.freq = most.freq
              GROUP BY freq.codigowine
           )
           SELECT * FROM mode
    

    You can find the most frequent price for each item. The first CTE, freq, gets the prices and their frequencies.

    The second CTE, most, finds the frequency of the most frequent price (or prices).

    The third CTE, mode, extracts the most frequent prices from freq using a JOIN. It also uses GROUP_CONCAT() because it's possible to have more than one mode--most frequent price.

    For your stats you can do this:

    WITH s7 AS (
      SELECT ROUND(MIN(preconormal), 2) minp,
             ROUND(AVG(preconormal), 2) meanp,
             ROUND(MAX(preconormal), 2) maxp,
             codigowine
        FROM precos
       WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
       GROUP BY codigowine
    ),
    s15 AS (
      SELECT ROUND(MIN(preconormal), 2) minp,
             ROUND(AVG(preconormal), 2) meanp,
             ROUND(MAX(preconormal), 2) maxp,
             codigowine
        FROM precos
       WHERE timestamp >= CURRENT_DATE - INTERVAL 15 DAY
       GROUP BY codigowine
    ),
    s30 AS (
      SELECT ROUND(MIN(preconormal), 2) minp,
             ROUND(AVG(preconormal), 2) meanp,
             ROUND(MAX(preconormal), 2) maxp,
             codigowine
        FROM precos
       WHERE timestamp >= CURRENT_DATE - INTERVAL 30 DAY
       GROUP BY codigowine
    ),
    m7 AS (
       WITH freq AS (
             SELECT COUNT(*) freq,
                    ROUND(preconormal, 2) preconormal,
                    codigowine
               FROM precos
               WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
               GROUP BY  ROUND(preconormal, 2), codigowine
         ),
         most AS (
            SELECT MAX(freq) freq,
                   codigowine
              FROM freq
             GROUP BY codigowine
        ),
        mode AS (
          SELECT GROUP_CONCAT(preconormal ORDER BY preconormal DESC) modeps,
                 freq.codigowine
            FROM freq
            JOIN most ON freq.freq = most.freq
           GROUP BY freq.codigowine
        )
        SELECT * FROM mode
    )
    SELECT v.codigowine, v.nomevinho, DATE(timestamp) AS data_adc,
           s7.minp min_7_dias, s7.maxp max_7_dias,  s7.meanp media_7_dias, m7.modeps moda_7_dias,
           s15.minp min_15_dias, s15.maxp max_15_dias,  s15.meanp media_15_dias, 
           s30.minp min_30_dias, s30.maxp max_30_dias,  s30.meanp media_30_dias
      FROM vinhos v
      LEFT JOIN s7 ON v.codigowine = s7.codigowine
      LEFT JOIN m7 ON v.codigowine = m7.codigowine
      LEFT JOIN s15 ON v.codigowine = s15.codigowine
      LEFT JOIN s30 ON v.codigowine = s30.codigowine
    

    I'll leave it to you to do the modes for 15 and 30 days.

    This is quite the query. You better hope the next guy to work on it doesn't curse your name. :-)