Hi I was trying to work on a teradata sql problem where I need to exclude all saledate in Aug 2005 and calculate the daily revenue for each store/month/year combination for any stores that have no fewer than 20 sale days.
My idea is to generate a subset of the saledate column in the subquery and work with it. And here is my code.
SELECT Sub.store, Sub.Year_, Sub.Month_, Sub.TotalSaleDate, Sub.Daily_rev, sub.Total_rev
FROM (SELECT (CASE WHEN (NOT (EXTRACT(MONTH from saledate)=8
AND EXTRACT(YEAR from saledate)=2005)) THEN saledate END) AS
NewSaleDate, COUNT(NewSaleDate) AS TotalSaleDate,
SUM(amt) AS Total_rev,
Total_rev/TotalSaleDate AS Daily_rev,
EXTRACT(MONTH from NewSaleDate) AS Month_,
EXTRACT(YEAR from NewSaleDate) AS Year_, store
FROM trnsact
WHERE stype = 'P' AND saledate = NewSaleDate
GROUP BY store, Year_, Month_, NewSaleDate) AS Sub
WHERE Sub.TotalSaleDate >= 20
ORDER BY sub.TotalSaledate ASC;
And this is my output My result
Here is code from someone that worked
SELECT
sub.store, sub.year_num, sub.month_num, sub.num_dates, sub.daily_revenue
FROM (
SELECT
store,
EXTRACT (month FROM saledate) AS month_num,
EXTRACT (year FROM saledate) AS year_num,
COUNT (DISTINCT saledate) AS num_dates,
SUM(amt) AS total_revenue,
total_revenue/num_dates AS daily_revenue,
(CASE
WHEN (year_num=2005 AND month_num=8) THEN 'cannot' ELSE 'can'
END) As can_use_anot
FROM trnsact
WHERE stype='p' AND can_use_anot='can'
GROUP BY store, month_num, year_num
) AS sub
HAVING sub.num_dates >=20
GROUP BY sub.store, sub.year_num, sub.month_num, sub.num_dates, sub.daily_revenue
ORDER BY sub.num_dates ASC;
And his result Correct result
Apparently his Daily revenue is much higher than mine. I wonder whether it is due to I am not counting Distinct saledate in the subquery. However, I tried to add use COUNT(DISTINCT saledate) and I get no output at all, 0 row. I understand how his code works but I`m frustrated where is wrong in my code. ESPECIALLY WHY ADDING DISTINCT GAVE ME 0 ROW, DEEPLY APPRECIATE ANYONE WHO CAN EXPLAIN...
Select #1 agggregates by date while #2 by month, just compare the number of rows returned.
What I don't understand why both queries use that strange way to exclude dates from Aug 2005. Should be done using a simple WHERE saledate not between date '2005-08-01' and date '2005-08-31'
And there's no need for using a Derived Table:
SELECT
store,
EXTRACT (month FROM saledate) AS month_num,
EXTRACT (year FROM saledate) AS year_num,
COUNT (DISTINCT saledate) AS num_dates,
SUM(amt) AS total_revenue,
total_revenue/num_dates AS daily_revenue
FROM trnsact
WHERE saledate not between date '2005-08-01' and date '2005-08-31'
and stype='p'
GROUP BY store, month_num, year_num
HAVING num_dates >=20
ORDER BY sub.num_dates ASC;
But there's probably a large number of rows per day/month and it might be more efficient to aggregate by day first avoiding EXTRACT on all rows and the COUNT(DISTINCT):
SELECT -- now aggregate by month
store,
EXTRACT (month FROM saledate) AS month_num,
EXTRACT (year FROM saledate) AS year_num,
COUNT (*) AS num_dates, -- no need for DISTINCT anymore
SUM(daily_amt) AS total_revenue,
total_revenue/num_dates AS daily_revenue
FROM
( -- daily sales first
SELECT
store,
saledate,
SUM(amt) AS daily_amt
FROM trnsact
WHERE saledate not between date '2005-08-01' and date '2005-08-31'
and stype='p'
GROUP BY store, saledate
) as dt
GROUP BY store, year_num, month_num
;