1st Table : AggregateValuesSum - SumProfit and NbCustomers are calculated group by all others characteristics columns.
Date MonthStatus Gender Age Income State SumProfit NbCustomers
2011-01-01 Status1 F 50 12000 CA 450000 3
2011-01-01 Status2 F 50 12000 CA 450000 2
2011-01-01 Status1 F 50 12000 CA 46000 2
2012-01-01 Status1 H 25 15000 NF -500 1
2012-01-01 Status2 H 45 20000 QC 12000 5
2nd Table : AggregatesBudget - Same table with AggregatesValueSum without MonthStatus column. SumBudget is calculated group by all others characteristics (For others analysis):
Date Gender Age Income State SumBudget
2011-01-01 F 50 12000 CA 30000
2012-01-01 H 25 15000 NF 10000
2012-01-01 H 45 20000 QC 5000
Final table : ConsolidationTable - Left join of 2 first tables on commons rows ( Gender, Age, Income, State). So i have all characteristics and all my calculated values.
Date MonthStatus Gender Age Income State SumProfit NbCustomers SumBudget
2011-01-01 Status1 F 50 12000 CA 450000 3 30000
2011-01-01 Status2 F 50 12000 CA 450000 2 30000
2011-01-01 Status1 F 50 12000 CA 46000 2 30000
2012-01-01 Status1 H 25 15000 NF -500 1 -20000
2012-01-01 Status2 H 45 20000 QC 12000 5 5000
As you can see, SumBudget is duplicated in this final table for rows that have same characteristics without considering the Sumprofit and MonthStatus. (Which is normal)
Everything seems fine for now. Then i validate the total of SumBudget per Date and compare the result given in 2nd tables and Final table :
Select
Date,
Sum(SumBudget) as TotalBudgetPerDate
from AggregatesBudget
group by Date
For some dates it give me a different result when i do this :
Select
Date,
Sum( distinct SumBudget) as TotalBudgetPerDate
from ConsolidationTable
group by Date
THE PROBLEMS :
I have to add a distinct in my 2nd query because in consolidation table there is duplicated SumBudget
For most of Date I have same result with AggregatesBudget Table. But sometimes the result in the ConsolidationTable is not right for some dates.
I don't understand why!
NB: Here was fake data in reality I have 49Md rows. And I won't share real data because of sensitivity.
This will occur if you have duplicate values in the aggregatesbudget
table on a single date (assuming you have no duplicate keys overall). You can check this using:
select date, sumbudget, count(*)
from aggregatesbudget
group by date, sumbudget
having count(*) > 1;
You could fix this using window functions:
select date, sum(sumbudget)
from (select ct.*,
row_number() over (partition by Gender, Age, Income, State, date order by date) as seqnum
from ConsolidationTable ct
) ct
where seqnum = 1
group by date;