I currently have a MariaDB database that gets populated every day with different products (around 800) and also gets the price updates for these products.
I've created a view on top of the prices/products table that generates statistics such as the avg, mean and mode for the last 7, 15 and 30 days, and calculates the difference from today's price to the averages of 7, 15 and 30 days.
The problem is that whenever I run this view it takes almost 50 seconds to generate the data. I saw some comments about switching over to a calculated table, in which the calculations would be updated when new data is entered into the table, however I'm quite skeptical in doing that, as I'm inserting around 1000 price points at one specific time of the day that will impact all the calculations on the table. Is a calculated table something that updates only the rows that were updated, or it would recalculate everything? I'm worried about the overhead this might cause (memory is not an issue with the server).
I've pasted the products and prices tables and the view to DBFiddle, here: https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=4cf594a85f950bed34f64d800601baa9
Calculations can be seen for product code 22141
Just to give an idea these are some of the calculations done by the view (available on the fiddle as well):
ROUND((((SELECT preconormal
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 7 DAY) - 1) * 100), 2) as dif_7_dias,
ROUND((((SELECT preconormal
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 15 DAY) - 1) * 100), 2) as dif_15_dias,
ROUND((((SELECT preconormal
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 9 HOUR) / (SELECT AVG(preconormal)
FROM precos
WHERE codigowine = vinhos.codigowine
AND timestamp >= CURRENT_DATE - INTERVAL 30 DAY) - 1) * 100), 2) as dif_30_dias
If switching to a calculated table, is there an optimal way to do this?
Looking at your query: Try refactoring it to eliminate as many dependent subqueries as possible, and instead JOINing to subqueries. Eliminating those dependent subqueries will make a vast performance difference.
Figuring the mode is an application of finding the detail record for an extreme value in a dataset. If you use this as a subquery
WITH freq AS (
SELECT COUNT(*) freq,
ROUND(preconormal, 2) preconormal,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY ROUND(preconormal, 2), codigowine
),
most AS (
SELECT MAX(freq) freq,
codigowine
FROM freq
GROUP BY codigowine
),
mode AS (
SELECT GROUP_CONCAT(preconormal ORDER BY preconormal DESC) modeps,
freq.codigowine
FROM freq
JOIN most ON freq.freq = most.freq
GROUP BY freq.codigowine
)
SELECT * FROM mode
You can find the most frequent price for each item. The first CTE, freq
, gets the prices and their frequencies.
The second CTE, most
, finds the frequency of the most frequent price (or prices).
The third CTE, mode
, extracts the most frequent prices from freq
using a JOIN. It also uses GROUP_CONCAT() because it's possible to have more than one mode--most frequent price.
For your stats you can do this:
WITH s7 AS (
SELECT ROUND(MIN(preconormal), 2) minp,
ROUND(AVG(preconormal), 2) meanp,
ROUND(MAX(preconormal), 2) maxp,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY codigowine
),
s15 AS (
SELECT ROUND(MIN(preconormal), 2) minp,
ROUND(AVG(preconormal), 2) meanp,
ROUND(MAX(preconormal), 2) maxp,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 15 DAY
GROUP BY codigowine
),
s30 AS (
SELECT ROUND(MIN(preconormal), 2) minp,
ROUND(AVG(preconormal), 2) meanp,
ROUND(MAX(preconormal), 2) maxp,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY codigowine
),
m7 AS (
WITH freq AS (
SELECT COUNT(*) freq,
ROUND(preconormal, 2) preconormal,
codigowine
FROM precos
WHERE timestamp >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY ROUND(preconormal, 2), codigowine
),
most AS (
SELECT MAX(freq) freq,
codigowine
FROM freq
GROUP BY codigowine
),
mode AS (
SELECT GROUP_CONCAT(preconormal ORDER BY preconormal DESC) modeps,
freq.codigowine
FROM freq
JOIN most ON freq.freq = most.freq
GROUP BY freq.codigowine
)
SELECT * FROM mode
)
SELECT v.codigowine, v.nomevinho, DATE(timestamp) AS data_adc,
s7.minp min_7_dias, s7.maxp max_7_dias, s7.meanp media_7_dias, m7.modeps moda_7_dias,
s15.minp min_15_dias, s15.maxp max_15_dias, s15.meanp media_15_dias,
s30.minp min_30_dias, s30.maxp max_30_dias, s30.meanp media_30_dias
FROM vinhos v
LEFT JOIN s7 ON v.codigowine = s7.codigowine
LEFT JOIN m7 ON v.codigowine = m7.codigowine
LEFT JOIN s15 ON v.codigowine = s15.codigowine
LEFT JOIN s30 ON v.codigowine = s30.codigowine
I'll leave it to you to do the modes for 15 and 30 days.
This is quite the query. You better hope the next guy to work on it doesn't curse your name. :-)