I am using MySQL for my Shiny App. I am using this query for App and it is running perfectly okay.
Select
concat(monthname(date_of_test), '-', year(date_of_test)) as 'Time',
product_group AS 'ProductGroup',
Pass,
Case
when pass='N' then @no:=count(distinct serial_number)
when pass='Y' then count(distinct serial_number)-@no
end as Count
from test_data
where
year(date_of_test)=2018
and product_group='BHO'
and month(date_of_test) between 3 and 4
group by
product_group,
month(date_of_test),
pass
But I need to change it in MS SQL Server. I have tried with declaring as variable and use it as in SQL Server.
My try in SQL Server:
declare @no int;
set @no = 0;
Select
CONCAT(datename(MM, date_of_test), '-', DATENAME(YY,date_of_test)) as 'Time',
product_group AS 'ProductGroup',
Pass,
case
when pass ='N' then @no = count(distinct serial_number)
when pass ='Y' then count(distinct serial_number)- @no
end as 'Count'
from test_data
where
year(date_of_test)=2018
and product_group='BHO'
and month(date_of_test) between 3 and 5
group by
product_group,
CONCAT(datename(MM, date_of_test),
'-',
DATENAME(YY,date_of_test)),
pass
The query without the variable is like:
Select
CONCAT(datename(MM, date_of_test), '-', DATENAME(YY,date_of_test)) as 'Time',
product_group AS 'ProductGroup',
Pass,
case
when pass ='N' then count(distinct serial_number)
when pass ='Y' then count(distinct serial_number)
end as 'Count'
from test_data
where
year(date_of_test)=2018 and product_group='BHO'
and month(date_of_test) between 3 and 4
group by
product_group,
CONCAT(datename(MM, date_of_test),
'-',
DATENAME(YY,date_of_test)),
pass
and it is producing the following output:
The desired output was like which is from MySQL. Please take a look where Pass=Y then the value of Pass=N subtracted from it.
It is showing an error.
My initial assumption: in MySQL I can initialize variable in query and can use it within it,but in MS SQL Server may be there is other rules. My syntax or process can be wrong.
Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO'and month(date_of_test)=4
503
Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO' and PASS='Y' and month(date_of_test)=4
503
Select Count(distinct serial_number) from Test_Data where year(date_of_test)=2018 and product_group='BHO' and PASS='N'and month(date_of_test)=4
71
SO all 503 product(serial number) gone for multiple test and get Pass=Y value but 71 product have gone through the same test where they have failed in some case where it is noted as Pass=N. So if I can calculate the (distinct serial_number with PASS=y)-(distinct serial_number with PASS=N) then it will give number of products who pass all the tests.
I can do this and the result is:
Select CONCAT(datename(MM, date_of_test),'-',DATENAME(YY,date_of_test)) as 'Time',product_group AS 'ProductGroup',
(Count(Distinct case when PASS='Y' then serial_number end)-Count(Distinct case when PASS='N' then serial_number end))
as ' All Test Passed',
Count(Distinct case when PASS='N' then serial_number end) as 'Min 1 Test Failed'
from test_data
where
year(date_of_test)=2018
and
month(date_of_test) between 3 and 4
and product_group='BHO'
group by product_group,CONCAT(datename(MM, date_of_test),'-',DATENAME(YY,date_of_test))
And the result is
Looks like the MySQL query is trying to emulate the LEAD()
or LAG()
analytic functions introduced in MySQL 8. Those were already available in SQL Server since 2012 (I think).
The MySQL query assumes results will be returned in a certain order, even though there's no ORDER BY clause. It also assumes there's no parallel processing, at least when the variable is processed.
The entire CASE
can be rewritten as :
count(distinct serial_number) -
LAG(count(distinct serial_number),1,0) OVER (
PARTITION BY product_group,month
ORDER BY pass)
This partions the GROUP BY
results by product_group,month
and then orders them by pass
. LAG
then returns the previous count in that partition, or 0 if there's no previous row. This means that LAG()
will return 0 for N
and N's count for Y
The complete query would look like this :
select
year(date_of_test),
month(date_of_test),
product_group,
pass,
count(distinct serial_number) -
LAG( COUNT(distinct serial_number),1,0)
OVER ( PARTITION BY product_group,month(date_of_test)
ORDER BY pass)
from test_data
where
year(date_of_test)=2018
and month(date_of_test) between 3 and 4
and product_group='BHO'
group by
year(date_of_test),
month(date_of_test),
product_group,
pass
A similar query could work with MySQL 8.
Performance and the query itself can be improved a lot by using a Calendar table. A calendar table is a table pre-populated with eg 20 years of dates which contains extra fields like month, month name, week number, work or holiday etc. This makes writing date-based queries a lot easier and the resulting queries a lot faster.
Assuming there was a calendar
table with only a few basic fields like date, year, month, mont_name, one could simplify the query to this :
select
calendar.month_name + '-' + calendar.year,
product_group,
pass,
count(distinct serial_number) -
LAG( COUNT(distinct serial_number),1,0)
OVER ( PARTITION BY product_group,calendar.month
ORDER BY pass)
from
test_data
inner join calendar on date_of_test=date
where
calendar.year =2018
and calendar.month between 3 and 4
and product_group='BHO'
group by
calendar.year,
calendar.month,
product_group,
pass
This query can take advantage of indexes on the date_of_test
, calendar.date
, calendar.year
and calendar.month
columns to find results.