Search code examples
mysqlsql-serverdatabasedatabase-administration

Case Statement Variable Declaration and use IN MS SQL SERVER


I am using MySQL for my Shiny App. I am using this query for App and it is running perfectly okay.

Select
    concat(monthname(date_of_test), '-', year(date_of_test)) as 'Time',
    product_group AS 'ProductGroup',
    Pass,
    Case
        when pass='N' then @no:=count(distinct serial_number)
        when pass='Y' then count(distinct serial_number)-@no
    end as Count
from test_data 
where 
    year(date_of_test)=2018 
    and product_group='BHO'
    and month(date_of_test) between 3 and 4
group by
    product_group,
    month(date_of_test),
    pass

But I need to change it in MS SQL Server. I have tried with declaring as variable and use it as in SQL Server.

My try in SQL Server:

declare @no int;
set @no = 0;
Select
    CONCAT(datename(MM, date_of_test), '-', DATENAME(YY,date_of_test)) as 'Time',
    product_group AS 'ProductGroup',
    Pass,
    case
        when pass ='N' then  @no = count(distinct serial_number)    
        when pass ='Y' then count(distinct serial_number)- @no  
    end as 'Count'
from test_data 
where
    year(date_of_test)=2018 
    and product_group='BHO'
    and month(date_of_test) between 3 and 5
group by
    product_group,
    CONCAT(datename(MM, date_of_test),
    '-',
    DATENAME(YY,date_of_test)),
    pass    

The query without the variable is like:

 Select
    CONCAT(datename(MM, date_of_test), '-', DATENAME(YY,date_of_test)) as 'Time',
    product_group AS 'ProductGroup',
    Pass,
    case
        when pass ='N' then count(distinct serial_number)    
        when pass ='Y' then count(distinct serial_number)
    end as 'Count'
from test_data 
where 
    year(date_of_test)=2018 and product_group='BHO'
    and month(date_of_test) between 3 and 4
group by
    product_group,
    CONCAT(datename(MM, date_of_test),
    '-',
    DATENAME(YY,date_of_test)),
    pass 

and it is producing the following output:

enter image description here

The desired output was like which is from MySQL. Please take a look where Pass=Y then the value of Pass=N subtracted from it.

enter image description here

It is showing an error.

My initial assumption: in MySQL I can initialize variable in query and can use it within it,but in MS SQL Server may be there is other rules. My syntax or process can be wrong.

Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO'and month(date_of_test)=4

503

Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO' and PASS='Y' and month(date_of_test)=4

503

Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO' and PASS='N'and month(date_of_test)=4

71

SO all 503 product(serial number) gone for multiple test and get Pass=Y value but 71 product have gone through the same test where they have failed in some case where it is noted as Pass=N. So if I can calculate the (distinct serial_number with PASS=y)-(distinct serial_number with PASS=N) then it will give number of products who pass all the tests.

I can do this and the result is:

Select CONCAT(datename(MM, date_of_test),'-',DATENAME(YY,date_of_test)) as 'Time',product_group AS 'ProductGroup',
                    (Count(Distinct case when PASS='Y' then serial_number end)-Count(Distinct case when PASS='N' then serial_number end)) 
                 as ' All Test Passed',
                 Count(Distinct case when PASS='N' then serial_number end) as 'Min 1 Test Failed'
               from test_data 
               where 
               year(date_of_test)=2018 
               and 
               month(date_of_test) between 3 and 4
               and product_group='BHO'
               group by product_group,CONCAT(datename(MM, date_of_test),'-',DATENAME(YY,date_of_test))

And the result is

enter image description here


Solution

  • Looks like the MySQL query is trying to emulate the LEAD() or LAG() analytic functions introduced in MySQL 8. Those were already available in SQL Server since 2012 (I think).

    The MySQL query assumes results will be returned in a certain order, even though there's no ORDER BY clause. It also assumes there's no parallel processing, at least when the variable is processed.

    The entire CASE can be rewritten as :

    count(distinct serial_number) - 
    LAG(count(distinct serial_number),1,0) OVER (
                            PARTITION BY product_group,month 
                            ORDER BY pass)
    

    This partions the GROUP BY results by product_group,month and then orders them by pass. LAG then returns the previous count in that partition, or 0 if there's no previous row. This means that LAG() will return 0 for N and N's count for Y

    The complete query would look like this :

    select 
        year(date_of_test),
        month(date_of_test),
        product_group,
        pass,
        count(distinct serial_number) - 
        LAG( COUNT(distinct serial_number),1,0) 
             OVER ( PARTITION BY product_group,month(date_of_test) 
                    ORDER BY pass)
    from test_data
    where 
        year(date_of_test)=2018 
        and month(date_of_test) between 3 and 4
        and product_group='BHO'
    group by 
        year(date_of_test),
        month(date_of_test),
        product_group,
        pass
    

    A similar query could work with MySQL 8.

    Performance and the query itself can be improved a lot by using a Calendar table. A calendar table is a table pre-populated with eg 20 years of dates which contains extra fields like month, month name, week number, work or holiday etc. This makes writing date-based queries a lot easier and the resulting queries a lot faster.

    Assuming there was a calendar table with only a few basic fields like date, year, month, mont_name, one could simplify the query to this :

    select 
        calendar.month_name + '-' + calendar.year,
        product_group,
        pass,
        count(distinct serial_number) - 
        LAG( COUNT(distinct serial_number),1,0) 
             OVER ( PARTITION BY product_group,calendar.month 
                    ORDER BY pass)
    from 
        test_data
        inner join calendar on date_of_test=date
    where 
        calendar.year =2018 
        and calendar.month between 3 and 4
        and product_group='BHO'
    group by 
        calendar.year,
        calendar.month,
        product_group,
        pass
    

    This query can take advantage of indexes on the date_of_test, calendar.date, calendar.year and calendar.month columns to find results.