SQL: When, why and how to use flags to check conditions

To understand the question, let's have a look on an example that might occur in real life...

So let's assume we have opened an ice cream parlour in year 2020. We would like to know which drinks have most been sold. Now, in year 2022, we want as example find out whether hot drinks have been sold in a sufficient quantity and amount or if we should only sell cold drinks in future.

In order to keep things simple, we assume ice cream and other things that are no drinks are listed separately, so we don't need to do this in our analysis. Let's say we have a very simple DB table drinks with some useful columns. To keep things simple again, we already see the quantity and amount of the articles as sum per year:

SELECT name,quantity,amount,year
FROM drinks
ORDER BY name,year;

name	quantity	amount	year
coffee	333	832.5	2020
coffee	150	375	2021
coffee	15	37.5	2022
coke	200	500	2020
coke	200	500	2021
coke	200	500	2022

Ok, for only two articles, this is very easy. We can directly compare the amount and quantity of sold coffees and sold cokes. But let's be more realistic and say that also further drinks have been sold: espresso, cappuccino, water, sprite. Now, we could think: No problem, we just check the name of the articles. So we write some conditions like these:

for hot drinks: name IN('coffee','cappuccino','espresso')

for cold drinks: name IN('coke','water','sprite')

But let's assume we executed a query with these conditions and noticed the outcome still can't be correct. So we found out that since 2021, also tea has been sold....Again, we change the condition for hot drinks:

name IN('coffee','cappuccino','espresso') 
OR name LIKE '%tea%'

This seems to be correct for 2020 and 2021. But for 2022, it seems to be wrong. Why that? After analyzing all entries of the table, we find out that in year 2022, also ice tea has been sold. Our condition counts ice tea as hot drink and this is of course incorrect. So we go to change our condition once again...Let's assume, our entire query is now this one:

SELECT 
SUM(CASE WHEN name IN('coffee','cappuccino','espresso') 
OR (name LIKE '%tea%' AND name NOT LIKE '%ice%')
THEN quantity ELSE 0 END) AS quantityHotDrinks,
SUM(CASE WHEN name IN('coffee','cappuccino','espresso') 
OR (name LIKE '%tea%' AND name NOT LIKE '%ice%')
THEN amount ELSE 0 END) AS amountHotDrinks,
SUM(CASE WHEN name IN('coke','water','sprite') 
OR name LIKE '%ice tea%'
THEN quantity ELSE 0 END) AS quantityColdDrinks,
SUM(CASE WHEN name IN('coke','water','sprite') 
OR name LIKE '%ice tea%'
THEN amount ELSE 0 END) AS amountColdDrinks,
year
FROM drinks
GROUP BY year

Wow, this is really long and bad to read. And it's very risky, too. As long as we just do queries and have a look, the risk may be low. But if we want to take decisions about which drinks we still want to offer and which ones not, we should be safe with our data, shouldn't we? In case for example the coke will next year be separated in one drink "coke zero" and another one "coke light" and maybe a third one "normal coke", should the conditions be changed once again? The more conditions we use, the higher is the risk of getting incorrect results. And the more difficult will it be to find out they are incorrect and what exactly is wrong. So how do we prevent this risky checking of conditions and can be sure that our results are correct?

Solution

Let's start the answer with a link to a DB fiddle example: db<>fiddle

There you can try out and replicate the issues described in the question and in this answer. But please read this answer first. So let's remember: We created a bad to read and complex query (see the question above) and the result set of this query is something like this:

Table drinks

quantityHotDrinks	amountHotDrinks	quantityColdDrinks	amountColdDrinks	year
588	1563	5700	6500	2020
1281	3247.8	5700	6500	2021
2030	5080.0	13700	27000	2022

Let's assume the conditions in our query are correct and let's have a look on them again:

for hot drinks:

name IN('coffee','cappuccino','espresso') 
OR (name LIKE '%tea%' AND name NOT LIKE '%ice%')

for cold drinks:

name IN('coke','water','sprite') 
OR name LIKE '%ice tea%'

So, after having a look again on both query and result set, it should be obvious that we only need one single condition to check and can get rid of all the rest: Is it a hot drink or not? So, this is a typical use case for a DB flag. Flags are quite common and of course not be used in SQL only. So, in many cases - and one of them is our case - such typical yes/no conditions can just be checked by a flag that can be 0 or 1.

Let's say we want to name this flag hotDrink. Some people prefer something like ishotDrink or even name it kindofDrink and fill it with "hot" and "cold" instead of 1 and 0. But we want to focus on how to use such a flag and not begin a discussion about the best name. If we had been more clever and experienced, we had created this flag already when creating the DB and wouldn't need to do this now.

But anyway, according to our assumption, we know the cases when the flag should be 1 and when it should be 0, we will execute a command like this to add the flag as further column to our table:

ALTER TABLE drinks
ADD hotDrink INT NOT NULL DEFAULT 0 
CHECK (hotDrink IN (0,1));

What does this mean? Well, we assume the default case of an article is that it's a cold drink and we say that all articles must either be a cold drink or a hot drink. If we want to create a flag that allows to leave the flag empty (maybe to show "unknown" in the result set), we just remove the part IS NOT NULL and if we also don't want to have a default value, we remove the part DEFAULT 0,too. The check then could be CHECK (hotDrink IN (0,1) OR hotdrink IS NULL).

After creating our flag, we want to update all existing drinks to tell them if they are a hot drink or not. So we use our conditions from above and execute an update command:

UPDATE drinks
SET hotDrink = 1
WHERE name IN('coffee','cappuccino','espresso') 
OR (name LIKE '%tea%' AND name NOT LIKE '%ice%');

And from now on, we will no longer use our strange and risky conditions in our queries. The risk that someone will fill this flag incorrectly will be much lower than we will miss some condition when checking the name of the article.

If we want to be even more sure that the flag is filled correctly, we can extend the application which is used to add or change articles in our DB. We can for example add a confirmation dialog that shows all properties of the article before saving it whenever an insert or update should be done.

To come back to our question, we just need one last step: Remove the conditions from our previous query and just check whether the flag is 1 or 0. As example:

SELECT 
SUM(CASE WHEN hotDrink = 1 
THEN quantity ELSE 0 END) AS quantityHotDrinks,
SUM(CASE WHEN hotDrink = 1 
THEN amount ELSE 0 END) AS amountHotDrinks,
SUM(CASE WHEN hotDrink = 0 
THEN quantity ELSE 0 END) AS quantityColdDrinks,
SUM(CASE WHEN hotDrink = 0  
THEN amount ELSE 0 END) AS amountColdDrinks,
year
FROM drinks
GROUP BY year

If we didn't make some mistakes, the result is still the same:

Table drinks

quantityHotDrinks	amountHotDrinks	quantityColdDrinks	amountColdDrinks	year
588	1563	5700	6500	2020
1281	3247.8	5700	6500	2021
2030	5080.0	13700	27000	2022

Great, so in future, all our queries will just check this flag and the risk we get incorrect data will be very low.