Search code examples
sqlsql-servergroup-byderived-table

Clean Data Using SQL - Take Column Difference


I have data in SQL as follows:

Actual Table

+-------------+--------+------+
|     Id      | Weight | Type |
+-------------+--------+------+
| 00011223344 |     35 | A    |
| 00011223344 |     10 | A    |
| 12311223344 |    100 | B    |
| 00034343434 |     25 | A    |
| 00034343434 |     25 | A    |
| 99934343434 |    200 | C    |
| 88855667788 |    100 | D    |
+-------------+--------+------+

Column ID will always have length of 11 and has data type varchar. I need to create a column Actual Weight and Actual ID from the table above.

Actual Id is dependent on column ID. If the ID starts with 000 than we need to find ID from column ID that does not starts with 000 but characters after that (i.e. 8 characters from right) are similar. Matched ID would be the Actual Id. For example if we look at first 3 ids first 2 starts with 000 and another ID that does not starts with 000 and contains similar 8 characters from right can be found in 3rd row i.e. 12311223344 therefore in derived column Actual ID the first 2 rows would have Actual Id as 12311223344.

Actual Weight is dependent on values in 2 columns ID and Weight. We need to group column Id based on the criteria mentioned above if for any Id that does not starts with 000 but contains another entry that does starts with 000. Then we need to recalculate Weight for Id that does not starts with 000 by adding all Weights of ones starting with 000 and taking difference with one that does not starts with 000.

Example if we look at first 3 rows, in 3rd row we have Id starting with 123 and having entries that have 8 digits from right similar to this one except they start with 000 instead of 123 (i.e. row 1 and 2). For cases starting with 000 Actual Weight would be similar to Weight but for the one starting with 123 Actual Weight would be 100-(35+10)

I am looking for a query that can create these 2 derived column without need of creating any other table/view.

Desired Output

+-------------+-------------+--------+---------------+------+
|     Id      |  Actual ID  | Weight | Actual Weight | Type |
+-------------+-------------+--------+---------------+------+
| 00011223344 | 12311223344 |     35 |            35 | A    |
| 00011223344 | 12311223344 |     10 |            10 | A    |
| 12311223344 | 12311223344 |    100 |            55 | B    |
| 00034343434 | 99934343434 |     25 |            25 | A    |
| 00034343434 | 99934343434 |     25 |            25 | A    |
| 99934343434 | 99934343434 |    200 |           150 | C    |
| 88855667788 | 88855667788 |    100 |           100 | D    |
+-------------+-------------+--------+---------------+------+

Solution

  • Hmmmm . . . If I'm following this:

    select t.*,
           (case when id like '000%' then weight
                 else weight - sum(case when id like '000%' then weight else 0 end) over (partition by actual_id)
            end) as actual_weight
    from (select t.*,
                 max(id) over (partition by stuff(id, 1, 3, '')) as actual_id
          from t
         ) t;
    

    Here is a db<>fiddle.