I have two tables:
TESTS:
ID | YEAR | DESCRIPTION |
---|---|---|
0 | 2023 | some text |
.. | ... | ... |
N | 2024 | some text |
test_prep_materials:
ID | PATH | TID |
---|---|---|
0 | /home/user/file0.pdf | 0 |
.. | ... | ... |
M | /home/user/fileM.pdf | 0 |
So any test have prep materials. And some test's have same material. I want to delete all material's for 2023 tests and DELETE FILES FROM DISK. But 2024 test's may have same path (SAME FILE ON DISK). So i need to pick all prep materials for 2023 tests and exclude from them all path's which have dependencies in 2024 tests.
I have tried the following, but there are too many results to know if the command was executed correctly.
SELECT `test_prep_materials`.`PATH`, `tests`.`id`, `tests`.`year`, `tests`.`Name_t`
from `test_prep_materials`
JOIN `tests` ON `test_prep_materials`.`tId` = `tests`.`id`
WHERE `tests`.`year` = 2023
EXCEPT
SELECT `test_prep_materials`.`PATH`, `tests`.`id`, `tests`.`year`, `tests`.`Name_t`
from `test_prep_materials`
JOIN `tests` ON `test_prep_materials`.`tId` = `tests`.`id`
WHERE `tests`.`year` <> 2023
Can somebody help me understand if my Query right or i am missing something?
It looks like you need paths to delete from past years tests that are not used in any test of this year. If that is the case that means you need paths used in last year and last year only - use Max(YEAR) = Min(YEAR) per PATH and < Max(YEAR) over all:
WITH -- S a m p l e D a t a :
tests ( ID, YEAR, DESCRIPTION ) AS
( Select 0, 2023, 'some text 0' Union All
Select 1, 2024, 'some text 1' Union All
Select 2, 2023, 'some text 2' Union All
Select 3, 2024, 'some text 3' Union All
Select 4, 2024, 'some text 4'
),
test_prep_materials ( ID, PATH, TID ) AS
( Select 100, '/home/user/file0.pdf', 0 Union All
Select 101, '/home/user/file0.pdf', 1 Union All
Select 102, '/home/user/file3.pdf', 1 Union All
Select 103, '/home/user/file3.pdf', 2 Union All
Select 104, '/home/user/file4.pdf', 3 Union All
Select 105, '/home/user/file4.pdf', 2 Union All
Select 106, '/home/user/file4.pdf', 1 Union All
Select 107, '/home/user/file5.pdf', 4 Union All
Select 108, '/home/user/file6.pdf', 0
)
-- S Q L :
Select tpm.PATH, Max(t.YEAR) as MAX_YEAR, Min(t.YEAR) as MIN_YEAR
From tests t
Left Join test_prep_materials tpm ON(tpm.TID = t.ID)
Group By tpm.PATH
HAVING Max(t.YEAR) < ( Select Max(YEAR) From tests )
/* R e s u l t :
PATH MAX_YEAR MIN_YEAR
---------------------- -------- --------
/home/user/file6.pdf 2023 2023 */
OR with all the data from tests table:
-- S Q L :
SELECT t.*, tpm.PATH
FROM tests t
INNER JOIN ( Select tpm.PATH, Max(t.YEAR) as MAX_YEAR, Min(t.YEAR) as MIN_YEAR
From tests t
Left Join test_prep_materials tpm ON(tpm.TID = t.ID)
Group By tpm.PATH
HAVING Max(t.YEAR) < ( Select Max(YEAR) From tests )
) tpm ON( t.YEAR Between tpm.MIN_YEAR And tpm.MAX_YEAR )
Where t.YEAR < ( Select Max(YEAR) From tests )
/* R e s u l t :
ID YEAR DESCRIPTION PATH
-- ---- ----------- ------------------------
0 2023 some text 0 /home/user/file6.pdf
2 2023 some text 2 /home/user/file6.pd */
NOTE:
If you want to reffer to any specific year within a range (like 2021 till 2026) change the subquery selecting Max(YEAR) over all with the specific year - lets say 2025 - and the result should be paths used before 2025 and not in 2025. That was not in your sample data, though.
...
HAVING Max(t.YEAR) < 2025
...
This can be tuned even more with different Where clause conditions.
ONE MORE APPROACH:
If you have the period of years from like 2021 untill 2026 and you want to reffer to 2024 seeking previous paths not present in 2024 then you could do something like below, but with hardcoded refference year (2024):
WITH -- S a m p l e D a t a :
tests ( ID, YEAR, DESCRIPTION ) AS
( Select 0, 2023, 'some text 0' Union All
Select 1, 2024, 'some text 1' Union All
Select 2, 2023, 'some text 2' Union All
Select 3, 2024, 'some text 3' Union All
Select 4, 2024, 'some text 4' Union All
-- new rows below
Select 5, 2021, 'some text 5' Union All
Select 6, 2024, 'some text 6' Union All
Select 7, 2026, 'some text 7'
),
test_prep_materials ( ID, PATH, TID ) AS
( Select 100, '/home/user/file0.pdf', 0 Union All
Select 101, '/home/user/file0.pdf', 1 Union All
Select 102, '/home/user/file3.pdf', 1 Union All
Select 103, '/home/user/file3.pdf', 2 Union All
Select 104, '/home/user/file4.pdf', 3 Union All
Select 105, '/home/user/file4.pdf', 2 Union All
Select 106, '/home/user/file4.pdf', 1 Union All
Select 107, '/home/user/file5.pdf', 4 Union All
Select 108, '/home/user/file6.pdf', 0 Union All
-- new rows below
Select 108, '/home/user/file6.pdf', 5 Union All
Select 108, '/home/user/file5.pdf', 6 Union All
Select 108, '/home/user/file5.pdf', 7
)
-- S Q L :
Select tpm.PATH, Max(t.YEAR) as MAX_YEAR, Min(t.YEAR) as MIN_YEAR
From tests t
Left Join test_prep_materials tpm ON(tpm.TID = t.ID)
Group By tpm.PATH
HAVING Count((Select 1 From tests Where ID = t.ID And Year = 2024)) = 0 And
Max(Case When t.YEAR <= 2024 Then t.YEAR Else 9999 End ) < 2024
/* R e s u l t :
PATH MAX_YEAR MIN_YEAR
---------------------- -------- --------
/home/user/file6.pdf 2023 2021 */