Search code examples
sqlsummary

SQL Query needs revision


I have the following query :

With Summary as (
Select a.ASSN As Association
      ,SUM(tonnage_adjusted) as TotalTonnage
      ,SUM(case when remarks = '' THEN tonnage_adjusted ELSE NULL END) as Monitored
      ,SUM(case when remarks = 'NO_TICKET' THEN tonnage_adjusted ELSE NULL END) as NotMonitored
From DeliveryTons d INNER JOIN FarmerGroups a ON d.reaping_code = a.REAPING_GROUP_CODE 
WHERE reaping_code IS NOT NULL AND d.crop_season = 4 
Group By a.ASSN
)

SELECT Association
  ,TotalTonnage
  , COALESCE(Monitored,0) As Monitored
  , COALESCE(NotMonitored,0)As NotMonitored
  , COALESCE(((Monitored/TotalTonnage) * 100),0) as pct_Monitored
FROM Summary 
Order by Association,TotalTonnage

And following table examples:

DeliveryTons

id | parcel_id | crop_season | tonnage_adjusted | reaping_code | remarks 
1    012-0075       4               25.60            NSCGA12     
2    011-0089       3               17.58            PSCPA      NO_TICKET 
3    001-0541       3               14.58            PSCPA       
4                   2               18.50            NSCGA12    NO_TICKER

FarmerGroups

ID | ASSN  |   REAPING_GROUP_CODE  |  CROP_SEASON
1    CSCPA        NSCGA12                  4
2    PSCPA        PSCPA                    3
3    PSCPA        NSCGA12                  4

I want to get the tonnage_adjusted totals by association, however I seem to be getting higher sums of tonnage_adjusted than are actually there by crop_season. Almost as pulling data from other crop_seasons. Can anyone see if there is anything wrong with the sql?


Solution

  • It seems that the join pulls more than 1 matching row from the FarmGroups table, resulting in a cross product.

    I guess that it should not, but now you have multiple entries for the same ReapingCode/Season combination.

    Maybe you want to cleanup your data and add an UNIQUE KEY(REAPING_GROUP_CODE, CROP_SEASON) to avoid such duplications in the future. If I understand correctly, a single REAPING_GROUP_CODE in a specific SEASON can only be accounted to one ASSN.

    If thats not possible, you can still do this to eliminate the duplicate on query time, but it will not perform as good as a cleanup + schema change:

    With Summary as (
    Select a.ASSN As Association
          ,SUM(tonnage_adjusted) as TotalTonnage
          ,SUM(case when remarks = '' THEN tonnage_adjusted ELSE NULL END) as Monitored
          ,SUM(case when remarks = 'NO_TICKET' THEN tonnage_adjusted ELSE NULL END) as NotMonitored
    FROM DeliveryTons d 
        /* NOTE the SELECT DISTINCT subquery here */
        INNER JOIN 
        (SELECT DISTINCT 
             ASSN,REAPING_CODE_GROUP,CROP_SEASON FROM FarmerGroups 
             WHERE CROP_SEASON = 4
        ) a 
        ON d.reaping_code = a.REAPING_GROUP_CODE 
        AND d.crop_season = a.CROP_SEASON
    WHERE reaping_code IS NOT NULL 
    Group By a.ASSN
    )
    
    SELECT Association
      ,TotalTonnage
      , COALESCE(Monitored,0) As Monitored
      , COALESCE(NotMonitored,0)As NotMonitored
      , COALESCE(((Monitored/TotalTonnage) * 100),0) as pct_Monitored
    FROM Summary 
    Order by Association,TotalTonnage