Search code examples
pythonmysqldatabricksdatabricks-sql

Cummulative substraction of 2 values using SQL or Python


rake_name origin destination created_At bpc_issue_date departure_time journey_distance bpc_balance_distance_km abc rn logic
ARIL-05 a b 09-02-2024 06:37 08-02-2024 18:30 09-02-2024 13:40 173.4600067 5271 5097.539993 1 5271-173.4600067
ARIL-05 a b 11-02-2024 02:58 08-02-2024 18:30 11-02-2024 04:15 1099.369995 5271 3998 2 5097-1099
ARIL-05 a b 14-02-2024 12:46 08-02-2024 18:30 14-02-2024 15:50 1099.369995 5271 2898 3 3998-1099
ARIL-05 a b 16-02-2024 17:29 08-02-2024 18:30 17-02-2024 02:30 175.6600037 5271 2713 4 2898-175
ARIL-05 a b 17-02-2024 10:54 08-02-2024 18:30 18-02-2024 11:30 1178.880005 5271 1544 5 2713-1178
ARIL-05 a b 20-02-2024 11:44 08-02-2024 18:30 20-02-2024 15:20 1179.369995 5271 364 6 1544-1179

Please look at the logic column for substraction i want.

so i want to update abc column with difference between bpc_balance_distance_km and journey_distance once i get that value i want to substract below abc values with the result of abc - respective journey_distance.

i have tried sql, however i am not sure how can i populate row >1, am able to populate 1st row using case when rn= 1 then bpc_balance_km - journey_distance

i have tried following query however, i awnt to covert this query int databricks sql supported query. this query works

SET @newabc := 0;

WITH cte AS (
 SELECT *,
       CASE WHEN rn=1 THEN @newabc := bpc_balance_distance_km-journey_distance
         ELSE @newabc := @newabc-journey_distance END AS nabc
 FROM derived_abc ORDER BY rn)
 SELECT *, ROUND(nabc) AS final_result
 FROM cte;

any suggastion in python or sql will be helpful. here is DB fiddle link : https://dbfiddle.uk/JBzaylSb i am using databricks sql


Solution

  • Here, try this:

    SET @newabc := 0;
    
    WITH cte AS (
     SELECT *,
           CASE WHEN rn=1 THEN @newabc := bpc_balance_distance_km-journey_distance
             ELSE @newabc := @newabc-journey_distance END AS nabc
     FROM derived_abc ORDER BY rn)
     SELECT *, ROUND(nabc) AS final_result
     FROM cte;
    

    In the query above, the value of bpc_balance_distance_km-journey_distance for rn=1 is assigned to the @newabc variable. Then for rows where rn > 1, it'll take the current assigned @newabc value to subtract with the row's journey_distance value.

    There'll be a slight different result in which causes by what I already point out in the comment - the inconsistencies of the decimal treatment. You can see the example in this demo fiddle.