I have a table with name_id, date and col1-col5 with rates. I'm looking for a solution to compare the average of the rates of the latest day with the average of the rates of the day before the latest day. When the average of the latest day is higher then the day before the latest day, then 'up'. When lower, then 'down'.
id | name_id | date | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|---|---|
1 | 3 | 28-07-2023 | 4,64 | 4,70 | 4,80 | 4,90 | 5,10 |
2 | 3 | 28-07-2023 | 4,75 | 4,86 | 4,87 | 5,05 | 5,15 |
3 | 3 | 28-07-2023 | 4,88 | 4,88 | 4,98 | 5,08 | NULL |
4 | 4 | 28-07-2023 | 4,98 | 5,04 | 5,16 | 5,29 | 5,45 |
5 | 4 | 28-07-2023 | 4,99 | 5,03 | 5,18 | 5,33 | 5,54 |
6 | 3 | 29-07-2023 | 5,00 | 5,07 | 5,15 | 5,29 | 5,52 |
7 | 3 | 29-07-2023 | 5,09 | NULL | 5,22 | 5,37 | 5,52 |
8 | 3 | 29-07-2023 | 5,30 | 5,30 | 5,45 | 5,50 | 5,70 |
9 | 4 | 29-07-2023 | 5,32 | 5,47 | 5,62 | 5,67 | 5,82 |
10 | 4 | 29-07-2023 | 5,32 | 5,42 | 5,52 | 5,62 | 5,67 |
The latest day:
SELECT AVG(CASE
WHEN COALESCE(`col1`, `col2`, `col3`, `col4`, `col5`) IS NOT NULL THEN
(COALESCE(`col1`, 0) + COALESCE(`col2`, 0) + COALESCE(`col3`, 0) + COALESCE(`col4`, 0) + COALESCE(`col5`, 0) ) /
(CASE WHEN `col1` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col2` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col3` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col4` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col5` IS NULL THEN 0 ELSE 1 END)
END) AS Average
FROM table WHERE `date`= (SELECT MAX(`date`) FROM table) AND name_id=3;
Average: 4.9028
The day before the latest day:
SELECT AVG(CASE
WHEN COALESCE(`col1`, `col2`, `col3`, `col4`, `col5`) IS NOT NULL THEN
(COALESCE(`col1`, 0) + COALESCE(`col2`, 0) + COALESCE(`col3`, 0) + COALESCE(`col4`, 0) + COALESCE(`col5`, 0) ) /
(CASE WHEN `col1` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col2` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col3` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col4` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col5` IS NULL THEN 0 ELSE 1 END)
END) AS Average
FROM table WHERE `date` = (SELECT MAX(date) FROM table WHERE date < (SELECT MAX(date) FROM table)) AND name_id=3;
Average: 5.3200
This query only compares the rates of the latest day with a 'fixxed' rate of 5.00 (not what I need):
SELECT CASE
WHEN AVG(CASE
WHEN COALESCE(`col1`, `col2`, `col3`, `col4`, `col5`) IS NOT NULL THEN
(COALESCE(`col1`, 0) + COALESCE(`col2`, 0) + COALESCE(`col3`, 0) + COALESCE(`col4`, 0)+ COALESCE(`col5`, 0) ) /
(CASE WHEN `col1` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col2` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col3` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col4` IS NULL THEN 0 ELSE 1 END +
CASE WHEN `col5` IS NULL THEN 0 ELSE 1 END)
END) < 5.00 THEN 'up' ELSE 'down' END AS 'Rate'
FROM table WHERE `date`= (SELECT MAX(`date`) FROM table) AND name_id=3;
Rate: up
How to make a query to compare the 2 outputs?
Edit: @user1191247
What is the reason for having 5 values per row instead of all values for the same name_id/date being in the same row? Besides the mentioned column 'name_id' there is another column 'fixed interest period'. The table contains mortgage interests and the columns col1-col5 represents different risk classes of mortgages. The interest of the mortgage is depending of the risk class and the interest period. One row is representing all classes of a particular interest period. One mortgage has multiple periods. The column name_id is representing the mortgage. In my example there is one mortgage with 3 periods with 5 risk classes per day. So your understanding about a 'set of multiple rows' is correct. You are right about the difference between the the difference between the average of all values, and the average of some intermediate averages. I need to compare the same set of rows. (my sample data is not correct with 1 colum with NULL last day and 4 columns with NULL the day before) So when comparing exactly the same sets of rows, the outcome of averages can be different, but the result (up - down) will be the same. Because your solutions is less code, I'll use it for my project. I'm still looking for a solution to compare the sets of rows with the latest date when the average was not the same. As not all rates change daily, I have a lot of 'Equal' as result. I want to know if the average is higher of lower since the latest change of the rates.
My understanding is that, where there are multiple rows for the same name_id/date pair, the columns col1 - col5 across the multiple rows form a single set. If my understanding is incorrect, you can completely ignore my answer ;-)
This is a contrived example to highlight the importance of understanding the difference between the average of all values, and the average of some intermediate averages (where the number of values in the intermediates varies).
Let's take this modified version of your sample data:
id | name_id | date | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|---|---|
1 | 3 | 2023-07-28 | 4.64 | 4.70 | 4.80 | 4.90 | 5.10 |
2 | 3 | 2023-07-28 | 4.75 | 4.86 | 4.87 | 5.05 | 5.15 |
3 | 3 | 2023-07-28 | 6.30 | NULL | NULL | NULL | NULL |
4 | 3 | 2023-07-29 | 5.00 | 5.07 | 5.15 | 5.29 | 5.52 |
5 | 3 | 2023-07-29 | 5.09 | NULL | 5.22 | 5.37 | 5.52 |
6 | 3 | 2023-07-29 | 5.30 | 5.30 | 5.45 | 5.50 | 5.70 |
And here is the same data with the sum, value count and average per row added:
id | name_id | date | col1 | col2 | col3 | col4 | col5 | sum | val_count | avg |
---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 2023-07-28 | 4.64 | 4.70 | 4.80 | 4.90 | 5.10 | 24.14 | 5 | 4.828000 |
2 | 3 | 2023-07-28 | 4.75 | 4.86 | 4.87 | 5.05 | 5.15 | 24.68 | 5 | 4.936000 |
3 | 3 | 2023-07-28 | 6.30 | NULL | NULL | NULL | NULL | 6.30 | 1 | 6.300000 |
4 | 3 | 2023-07-29 | 5.00 | 5.07 | 5.15 | 5.29 | 5.52 | 26.03 | 5 | 5.206000 |
5 | 3 | 2023-07-29 | 5.09 | NULL | 5.22 | 5.37 | 5.52 | 21.20 | 4 | 5.300000 |
6 | 3 | 2023-07-29 | 5.30 | 5.30 | 5.45 | 5.50 | 5.70 | 27.25 | 5 | 5.450000 |
Now let's look at the average of the averages vs the average of all values for each name_id/date grouping:
name_id | date | avg_avg | avg |
---|---|---|---|
3 | 2023-07-28 | 5.3546666667 | 5.010909 |
3 | 2023-07-29 | 5.3186666667 | 5.320000 |
The avg_avg value has decreased between 2023-07-28 and 2023-07-28, but the avg value has increased for the same period.
Modifying Ankit's answer to compare the averages, instead of the averages of intermediate averages, we end up with:
WITH latest_day_avg AS (
SELECT name_id,
SUM(IFNULL(col1, 0) + IFNULL(col2, 0) + IFNULL(col3, 0) + IFNULL(col4, 0) + IFNULL(col5, 0))
/
SUM((`col1` IS NOT NULL) + (`col2` IS NOT NULL) + (`col3` IS NOT NULL) + (`col4` IS NOT NULL) + (`col5` IS NOT NULL)) AS Average
FROM tbl
WHERE `date` = (SELECT MAX(`date`) FROM tbl)
GROUP BY name_id
),
prev_day_avg AS(
SELECT name_id,
SUM(IFNULL(col1, 0) + IFNULL(col2, 0) + IFNULL(col3, 0) + IFNULL(col4, 0) + IFNULL(col5, 0))
/
SUM((`col1` IS NOT NULL) + (`col2` IS NOT NULL) + (`col3` IS NOT NULL) + (`col4` IS NOT NULL) + (`col5` IS NOT NULL)) AS Average
FROM tbl
WHERE `date` = (SELECT MAX(date) FROM tbl WHERE date < (SELECT MAX(date) FROM tbl))
GROUP BY name_id
)
SELECT CASE WHEN lda.average < pda.average THEN 'Down'
WHEN lda.average > pda.average THEN 'Up'
ELSE 'Equal'
END AS Rate
FROM latest_day_avg lda
JOIN prev_day_avg pda ON lda.name_id = pda.name_id
WHERE lda.name_id = 3;
Here's a db<fiddle.