Search code examples
sql-serverif-statementmathcaseparameterized-query

Using Arithmetic in SQL on my own columns to fill a third column where it is zero. (complicated, only when certain criteria is met)


So here is my question. Brace yourself as it takes some thinking just to wrap your head around what I am trying to do. I'm working with Quarterly census employment and wage data. QCEW data has something called suppression codes. If a data denomination (comes in overall, location quotient, and over the year each year each quarter) is suppressed, then all the data for that denomination is zero. I have my table set up in the following way (only showing you columns that are relevant for the question):

A County_Id column,              
Industry_ID column,  
Year column,  
Qtr column,

Suppressed column (0 for not suppressed and 1 for suppressed),
Data_Category column (1 for overall, 2 for lq, and 3 for over the year),
Data_Denomination column (goes 1-8 for what specific data is being looked at in that category ex: monthly employment,Taxable wage, etc. typical data), and a value column (which will be zero if the Data_Category is suppressed - since all the data denomination values will be zero).

Now, if Overall data (cat 1) for, say, 1991 quarter 1 is suppressed, but the next year quarter 1 has both overall and over the year (cats 1 and 3) NOT suppressed, then we can infer what the value would be for that first year's suppressed data, since OTY1991q1 = (Overall1991q1 - Overall1990q1). So to find that suppressed data we would just subtract our cat 1 (denom 1-8) values from our cat 3 (denom 1-8) values to replace the zeroes that are in our suppressed values from the year before. It's fairly easy to grasp mathematically, the difficulty is that there are millions of columns with which to check for these criteria. I'm trying to write some kind of SQL query that would do this for me, check to make sure Overall-n qtr-n is suppressed, then look to see if the next year isn't for both overall and oty, (in maybe some sort of complicated case statement? Then if those criteria are met, perform the arithmetic for the two Data_Cat-Data_Denom categories and replace the zero in the respective Cat-Denom values.

Below is a simple sample (non-relevant data_cats removed) that I hope will help get what I'm trying to do across.

|CountyID IndustryID Year Qtr Suppressed Data_Cat Data_Denom Value                                                                          
| 5            10      1990  1      1          1        1        0                                                                                                                                                     
| 5            10      1990  1      1          1        2        0                                                                                                                                                     
| 5            10      1990  1      1          1        3        0                                                                                                                                                     
| 5            10      1991  1      0          1        1        5                                                                                                                                                     
| 5            10      1991  1      0          1        2        15                                                                                                                                                    
| 5            10      1991  1      0          1        3        25                                                                                                                                                    
| 5            10      1991  1      0          3        1        20                                                                                                                                                    
| 5            10      1991  1      0          3        2        20                                                                                                                                                    
| 5            10      1991  1      0          3        3        35  

So basically what we're trying to do here is take the overall data from each data category (I removed lq ~ data_cat 2) because it isn't relevant and data_denom (which I've narrowed down from 8 to 3 for simplicity) in 1991, subtract it from the overall 1991 value and that will give you the applicable
| value for the previous year's 1990 cat_1. So here data_cat 1 Data_denom 1 would be 15 (20-5), denom 2 would be 5(20-15), and denom 3 would be 10(35-25). (Oty 1991q1 - overall 1991q1) = 1990q1. I hope this helps. Like I said the problem isn't the math it's formulating a query that will check this criteria millions and millions of times.


Solution

  • If you want to find supressed data that has 2 rows of unsupressed data for the next year and quarter, we could use cross apply() to do something like this:

    test setup: http://rextester.com/ORNCFR23551

    using cross apply() to return rows with a valid derived value:

    select t.*
      , NewValue = cat3.value - cat1.value
    from t
      cross apply (
          select i.value 
          from t as i
          where i.CountyID   = t.CountyID
            and i.IndustryID = t.IndustryID
            and i.Data_Denom = t.Data_Denom 
            and i.Year       = t.Year +1
            and i.Qtr        = t.Qtr
            and i.Suppressed = 0
            and i.Data_Cat   = 1
      ) cat1
      cross apply (
          select i.value 
          from t as i
          where i.CountyID   = t.CountyID
            and i.IndustryID = t.IndustryID
            and i.Data_Denom = t.Data_Denom 
            and i.Year       = t.Year +1
            and i.Qtr        = t.Qtr 
            and i.Suppressed = 0
            and i.Data_Cat   = 3
      ) cat3
    where t.Suppressed = 1
      and t.Data_Cat   = 1
    

    returns:

    +----------+------------+------+-----+------------+----------+------------+-------+----------+
    | CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
    +----------+------------+------+-----+------------+----------+------------+-------+----------+
    |        5 |         10 | 1990 |   1 |          1 |        1 |          1 |     0 |       15 |
    |        5 |         10 | 1990 |   1 |          1 |        1 |          2 |     0 |        5 |
    |        5 |         10 | 1990 |   1 |          1 |        1 |          3 |     0 |       10 |
    +----------+------------+------+-----+------------+----------+------------+-------+----------+
    


    Using outer apply() to return all rows

    select t.*
      , NewValue = coalesce(nullif(t.value,0),cat3.value - cat1.value,0)
    from t
      outer apply (
          select i.value 
          from t as i
          where i.CountyID   = t.CountyID
            and i.IndustryID = t.IndustryID
            and i.Data_Denom = t.Data_Denom 
            and i.Year       = t.Year +1
            and i.Qtr        = t.Qtr
            and i.Suppressed = 0
            and i.Data_Cat   = 1
      ) cat1
      outer apply (
          select i.value 
          from t as i
          where i.CountyID   = t.CountyID
            and i.IndustryID = t.IndustryID
            and i.Data_Denom = t.Data_Denom 
            and i.Year       = t.Year +1
            and i.Qtr        = t.Qtr
            and i.Suppressed = 0
            and i.Data_Cat   = 3
      ) cat3
    

    returns:

    +----------+------------+------+-----+------------+----------+------------+-------+----------+
    | CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue |
    +----------+------------+------+-----+------------+----------+------------+-------+----------+
    |        5 |         10 | 1990 |   1 |          1 |        1 |          1 |     0 |       15 |
    |        5 |         10 | 1990 |   1 |          1 |        1 |          2 |     0 |        5 |
    |        5 |         10 | 1990 |   1 |          1 |        1 |          3 |     0 |       10 |
    |        5 |         10 | 1991 |   1 |          0 |        1 |          1 |     5 |        5 |
    |        5 |         10 | 1991 |   1 |          0 |        1 |          2 |    15 |       15 |
    |        5 |         10 | 1991 |   1 |          0 |        1 |          3 |    25 |       25 |
    |        5 |         10 | 1991 |   1 |          0 |        3 |          1 |    20 |       20 |
    |        5 |         10 | 1991 |   1 |          0 |        3 |          2 |    20 |       20 |
    |        5 |         10 | 1991 |   1 |          0 |        3 |          3 |    35 |       35 |
    +----------+------------+------+-----+------------+----------+------------+-------+----------+