Search code examples
sqlapache-spark-sqldatabricksazure-databricks

Databricks : Equivalent code for SQL query


I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING SPLIT part.

Sample SQL data:

    CREATE TABLE FactTurnover
    (
    ID INT,
    SalesPriceExcl NUMERIC (9,4),
  Discount VARCHAR(100)
  )
 INSERT INTO FactTurnover
 VALUES 
   (1, 100, '10'),
   (2, 39.5877, '58, 12'),
   (3, 100, '50, 10, 15'),
   (4, 100, 'B')

Query:

    ;WITH CTE AS
    (
     SELECT Id, SalesPriceExcl, 
         CASE WHEN value = 'B' THEN 0
         ELSE CAST(value as int) END AS Discount
     From FactTurnover
     CROSS APPLY STRING_SPLIT(Discount, ',')
     )
     SELECT Id,  
       Min(SalesPriceExcl) AS SalesPriceExcludingDiscount,
       EXP(SUM(LOG((100 - Discount) / 100.0))) As TotalDiscount,
       Cast(EXP(SUM(LOG((100 - Discount) / 100.0))) * 
            MIN(SalesPriceExcl) As Numeric(9,2))
        PriceAfterDiscount
     FROM CTE
     GROUP BY ID

Expected Results:

| Id | SalesPriceExcludingDiscount |       TotalDiscount | PriceAfterDiscount |

|----|-----------------------------|---------------------|--------------------|

|  1 |                         100 |                 0.9 |                 90 |

|  2 |                     39.5877 | 0.36960000000000004 |              14.63 |

|  3 |                         100 | 0.38250000000000006 |              38.25 |

|  4 |                         100 |                   1 |                100 |

Solution

  • Use SPLIT to convert the comma-separated string to an array then use LATERAL VIEW and EXPLODE to do operations on the elements of that array. The roughly equivalent syntax (including CTEs) is:

    %sql
    --SELECT * FROM FactTurnover;
    
    WITH cte AS
    (
    SELECT *
    FROM
      (
      SELECT Id, SalesPriceExcl, SPLIT ( Discount, ',' ) AS discountArray
      FROM FactTurnover
      ) x
      LATERAL VIEW EXPLODE ( discountArray ) x AS xdiscount
    )
    SELECT 
      Id,
      MIN(SalesPriceExcl) AS SalesPriceExcludingDiscount,
      EXP ( SUM( LOG( ( 100 - xdiscount ) / 100.00 ) ) ) AS TotalDiscount
    FROM cte
    GROUP BY Id
    ORDER BY Id
    

    If you are feeling brave, you could also do this using higher order functions. I've included two examples below. I would say these are harder to debug and you should probably try them performance-wise, it depends what you're comfortable with:

    %sql
    -- Convert Discount text column to array with SPLIT function and filter out value 'B' from the array
    ;WITH filterB AS (
    SELECT *, FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ) discountArray
    FROM FactTurnover
    ), cte1 AS (
    -- Do initial calcs on array
    SELECT 
      Id,
      TRANSFORM ( discountArray, discountArray -> LOG( ( 100 - discountArray ) / 100.00 ) ) discountArray2
    FROM filterB
    )
    SELECT
      Id,
      EXP( AGGREGATE ( discountArray2, CAST( 0 AS DOUBLE ), ( x, y ) -> x + y ) ) AS x
    FROM cte1;
    
    -- all in one example
    SELECT 
        Id,
        EXP( AGGREGATE( TRANSFORM( FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ), y -> LOG( ( 100 - y ) / 100.00 ) ), CAST( 0 AS DOUBLE ), ( z, a ) -> z + a ) )
        AS final
    FROM FactTurnover
    ORDER BY Id