Search code examples
sqlconditional-statementsflags

SQL: When, why and how to use flags to check conditions


To understand the question, let's have a look on an example that might occur in real life...

So let's assume we have opened an ice cream parlour in year 2020. We would like to know which drinks have most been sold. Now, in year 2022, we want as example find out whether hot drinks have been sold in a sufficient quantity and amount or if we should only sell cold drinks in future.

In order to keep things simple, we assume ice cream and other things that are no drinks are listed separately, so we don't need to do this in our analysis. Let's say we have a very simple DB table drinks with some useful columns. To keep things simple again, we already see the quantity and amount of the articles as sum per year:

SELECT name,quantity,amount,year
FROM drinks
ORDER BY name,year;
name quantity amount year
coffee 333 832.5 2020
coffee 150 375 2021
coffee 15 37.5 2022
coke 200 500 2020
coke 200 500 2021
coke 200 500 2022

Ok, for only two articles, this is very easy. We can directly compare the amount and quantity of sold coffees and sold cokes. But let's be more realistic and say that also further drinks have been sold: espresso, cappuccino, water, sprite. Now, we could think: No problem, we just check the name of the articles. So we write some conditions like these:

for hot drinks: name IN('coffee','cappuccino','espresso')

for cold drinks: name IN('coke','water','sprite')

But let's assume we executed a query with these conditions and noticed the outcome still can't be correct. So we found out that since 2021, also tea has been sold....Again, we change the condition for hot drinks:

name IN('coffee','cappuccino','espresso') 
OR name LIKE '%tea%'

This seems to be correct for 2020 and 2021. But for 2022, it seems to be wrong. Why that? After analyzing all entries of the table, we find out that in year 2022, also ice tea has been sold. Our condition counts ice tea as hot drink and this is of course incorrect. So we go to change our condition once again...Let's assume, our entire query is now this one:

SELECT 
SUM(CASE WHEN name IN('coffee','cappuccino','espresso') 
OR (name LIKE '%tea%' AND name NOT LIKE '%ice%')
THEN quantity ELSE 0 END) AS quantityHotDrinks,
SUM(CASE WHEN name IN('coffee','cappuccino','espresso') 
OR (name LIKE '%tea%' AND name NOT LIKE '%ice%')
THEN amount ELSE 0 END) AS amountHotDrinks,
SUM(CASE WHEN name IN('coke','water','sprite') 
OR name LIKE '%ice tea%'
THEN quantity ELSE 0 END) AS quantityColdDrinks,
SUM(CASE WHEN name IN('coke','water','sprite') 
OR name LIKE '%ice tea%'
THEN amount ELSE 0 END) AS amountColdDrinks,
year
FROM drinks
GROUP BY year

Wow, this is really long and bad to read. And it's very risky, too. As long as we just do queries and have a look, the risk may be low. But if we want to take decisions about which drinks we still want to offer and which ones not, we should be safe with our data, shouldn't we? In case for example the coke will next year be separated in one drink "coke zero" and another one "coke light" and maybe a third one "normal coke", should the conditions be changed once again? The more conditions we use, the higher is the risk of getting incorrect results. And the more difficult will it be to find out they are incorrect and what exactly is wrong. So how do we prevent this risky checking of conditions and can be sure that our results are correct?


Solution

  • Let's start the answer with a link to a DB fiddle example: db<>fiddle

    There you can try out and replicate the issues described in the question and in this answer. But please read this answer first. So let's remember: We created a bad to read and complex query (see the question above) and the result set of this query is something like this:

    Table drinks

    quantityHotDrinks amountHotDrinks quantityColdDrinks amountColdDrinks year
    588 1563 5700 6500 2020
    1281 3247.8 5700 6500 2021
    2030 5080.0 13700 27000 2022

    Let's assume the conditions in our query are correct and let's have a look on them again:

    for hot drinks:

    name IN('coffee','cappuccino','espresso') 
    OR (name LIKE '%tea%' AND name NOT LIKE '%ice%')
    

    for cold drinks:

    name IN('coke','water','sprite') 
    OR name LIKE '%ice tea%'
    

    So, after having a look again on both query and result set, it should be obvious that we only need one single condition to check and can get rid of all the rest: Is it a hot drink or not? So, this is a typical use case for a DB flag. Flags are quite common and of course not be used in SQL only. So, in many cases - and one of them is our case - such typical yes/no conditions can just be checked by a flag that can be 0 or 1.

    Let's say we want to name this flag hotDrink. Some people prefer something like ishotDrink or even name it kindofDrink and fill it with "hot" and "cold" instead of 1 and 0. But we want to focus on how to use such a flag and not begin a discussion about the best name. If we had been more clever and experienced, we had created this flag already when creating the DB and wouldn't need to do this now.

    But anyway, according to our assumption, we know the cases when the flag should be 1 and when it should be 0, we will execute a command like this to add the flag as further column to our table:

    ALTER TABLE drinks
    ADD hotDrink INT NOT NULL DEFAULT 0 
    CHECK (hotDrink IN (0,1));
    

    What does this mean? Well, we assume the default case of an article is that it's a cold drink and we say that all articles must either be a cold drink or a hot drink. If we want to create a flag that allows to leave the flag empty (maybe to show "unknown" in the result set), we just remove the part IS NOT NULL and if we also don't want to have a default value, we remove the part DEFAULT 0,too. The check then could be CHECK (hotDrink IN (0,1) OR hotdrink IS NULL).

    After creating our flag, we want to update all existing drinks to tell them if they are a hot drink or not. So we use our conditions from above and execute an update command:

    UPDATE drinks
    SET hotDrink = 1
    WHERE name IN('coffee','cappuccino','espresso') 
    OR (name LIKE '%tea%' AND name NOT LIKE '%ice%');
    

    And from now on, we will no longer use our strange and risky conditions in our queries. The risk that someone will fill this flag incorrectly will be much lower than we will miss some condition when checking the name of the article.

    If we want to be even more sure that the flag is filled correctly, we can extend the application which is used to add or change articles in our DB. We can for example add a confirmation dialog that shows all properties of the article before saving it whenever an insert or update should be done.

    To come back to our question, we just need one last step: Remove the conditions from our previous query and just check whether the flag is 1 or 0. As example:

    SELECT 
    SUM(CASE WHEN hotDrink = 1 
    THEN quantity ELSE 0 END) AS quantityHotDrinks,
    SUM(CASE WHEN hotDrink = 1 
    THEN amount ELSE 0 END) AS amountHotDrinks,
    SUM(CASE WHEN hotDrink = 0 
    THEN quantity ELSE 0 END) AS quantityColdDrinks,
    SUM(CASE WHEN hotDrink = 0  
    THEN amount ELSE 0 END) AS amountColdDrinks,
    year
    FROM drinks
    GROUP BY year
    

    If we didn't make some mistakes, the result is still the same:

    Table drinks

    quantityHotDrinks amountHotDrinks quantityColdDrinks amountColdDrinks year
    588 1563 5700 6500 2020
    1281 3247.8 5700 6500 2021
    2030 5080.0 13700 27000 2022

    Great, so in future, all our queries will just check this flag and the risk we get incorrect data will be very low.