Search code examples
left-joinsql-server-2017

Float value decrease after left join


1st Table : AggregateValuesSum - SumProfit and NbCustomers are calculated group by all others characteristics columns.

Date        MonthStatus Gender   Age   Income   State   SumProfit  NbCustomers
2011-01-01  Status1     F        50    12000    CA       450000    3
2011-01-01  Status2     F        50    12000    CA       450000    2
2011-01-01  Status1     F        50    12000    CA       46000     2
2012-01-01  Status1     H        25    15000    NF       -500      1
2012-01-01  Status2     H        45    20000    QC       12000     5

2nd Table : AggregatesBudget - Same table with AggregatesValueSum without MonthStatus column. SumBudget is calculated group by all others characteristics (For others analysis):

Date         Gender   Age   Income   State      SumBudget
2011-01-01   F        50    12000    CA         30000
2012-01-01   H        25    15000    NF         10000
2012-01-01   H        45    20000    QC         5000

Final table : ConsolidationTable - Left join of 2 first tables on commons rows ( Gender, Age, Income, State). So i have all characteristics and all my calculated values.

Date        MonthStatus Gender   Age   Income   State   SumProfit  NbCustomers  SumBudget
2011-01-01  Status1     F        50    12000    CA       450000    3            30000
2011-01-01  Status2     F        50    12000    CA       450000    2            30000
2011-01-01  Status1     F        50    12000    CA       46000     2            30000
2012-01-01  Status1     H        25    15000    NF       -500      1            -20000
2012-01-01  Status2     H        45    20000    QC       12000     5            5000

As you can see, SumBudget is duplicated in this final table for rows that have same characteristics without considering the Sumprofit and MonthStatus. (Which is normal)

Everything seems fine for now. Then i validate the total of SumBudget per Date and compare the result given in 2nd tables and Final table :

Select
 Date,
 Sum(SumBudget) as TotalBudgetPerDate
from AggregatesBudget
 group by Date

For some dates it give me a different result when i do this :

Select
  Date,
  Sum( distinct SumBudget)  as TotalBudgetPerDate
from ConsolidationTable
group by Date

THE PROBLEMS :

  • The TotalBudgetPerDate in the AggregatesBudget Table is always what i expect so it's OK.
  • I have to add a distinct in my 2nd query because in consolidation table there is duplicated SumBudget

  • For most of Date I have same result with AggregatesBudget Table. But sometimes the result in the ConsolidationTable is not right for some dates.

I don't understand why!

NB: Here was fake data in reality I have 49Md rows. And I won't share real data because of sensitivity.


Solution

  • This will occur if you have duplicate values in the aggregatesbudget table on a single date (assuming you have no duplicate keys overall). You can check this using:

    select date, sumbudget, count(*)
    from aggregatesbudget
    group by date, sumbudget
    having count(*) > 1;
    

    You could fix this using window functions:

    select date, sum(sumbudget)
    from (select ct.*,
                 row_number() over (partition by Gender, Age, Income, State, date order by date) as seqnum
          from ConsolidationTable ct
         ) ct
    where seqnum = 1
    group by date;