I have the following query :
With Summary as (
Select a.ASSN As Association
,SUM(tonnage_adjusted) as TotalTonnage
,SUM(case when remarks = '' THEN tonnage_adjusted ELSE NULL END) as Monitored
,SUM(case when remarks = 'NO_TICKET' THEN tonnage_adjusted ELSE NULL END) as NotMonitored
From DeliveryTons d INNER JOIN FarmerGroups a ON d.reaping_code = a.REAPING_GROUP_CODE
WHERE reaping_code IS NOT NULL AND d.crop_season = 4
Group By a.ASSN
)
SELECT Association
,TotalTonnage
, COALESCE(Monitored,0) As Monitored
, COALESCE(NotMonitored,0)As NotMonitored
, COALESCE(((Monitored/TotalTonnage) * 100),0) as pct_Monitored
FROM Summary
Order by Association,TotalTonnage
And following table examples:
DeliveryTons
id | parcel_id | crop_season | tonnage_adjusted | reaping_code | remarks
1 012-0075 4 25.60 NSCGA12
2 011-0089 3 17.58 PSCPA NO_TICKET
3 001-0541 3 14.58 PSCPA
4 2 18.50 NSCGA12 NO_TICKER
FarmerGroups
ID | ASSN | REAPING_GROUP_CODE | CROP_SEASON
1 CSCPA NSCGA12 4
2 PSCPA PSCPA 3
3 PSCPA NSCGA12 4
I want to get the tonnage_adjusted totals by association, however I seem to be getting higher sums of tonnage_adjusted than are actually there by crop_season. Almost as pulling data from other crop_seasons. Can anyone see if there is anything wrong with the sql?
It seems that the join pulls more than 1 matching row from the FarmGroups table, resulting in a cross product.
I guess that it should not, but now you have multiple entries for the same ReapingCode/Season combination.
Maybe you want to cleanup your data and add an UNIQUE KEY(REAPING_GROUP_CODE, CROP_SEASON)
to avoid such duplications in the future. If I understand correctly, a single REAPING_GROUP_CODE in a specific SEASON can only be accounted to one ASSN.
If thats not possible, you can still do this to eliminate the duplicate on query time, but it will not perform as good as a cleanup + schema change:
With Summary as (
Select a.ASSN As Association
,SUM(tonnage_adjusted) as TotalTonnage
,SUM(case when remarks = '' THEN tonnage_adjusted ELSE NULL END) as Monitored
,SUM(case when remarks = 'NO_TICKET' THEN tonnage_adjusted ELSE NULL END) as NotMonitored
FROM DeliveryTons d
/* NOTE the SELECT DISTINCT subquery here */
INNER JOIN
(SELECT DISTINCT
ASSN,REAPING_CODE_GROUP,CROP_SEASON FROM FarmerGroups
WHERE CROP_SEASON = 4
) a
ON d.reaping_code = a.REAPING_GROUP_CODE
AND d.crop_season = a.CROP_SEASON
WHERE reaping_code IS NOT NULL
Group By a.ASSN
)
SELECT Association
,TotalTonnage
, COALESCE(Monitored,0) As Monitored
, COALESCE(NotMonitored,0)As NotMonitored
, COALESCE(((Monitored/TotalTonnage) * 100),0) as pct_Monitored
FROM Summary
Order by Association,TotalTonnage