Search code examples
sqlcursorsql-server-2014

SQL Calculating Median per Product in Cursor


I'm trying to calculate the mediam per Product (SKU) within a selected list but its taking hours to complete. I receive a list contining 4 columns (SKU, Branch, PriceList, Campaign) and I need to import a new data row per SKU with the mediam of the SKU as its PriceList.

To achive this, I import the unique list of SKUs into a TEMP TABLE, then create a CURSOR to run through the TEMP TABLE, calculate the median within the CURSOR.

DECLARE @c BIGINT
DECLARE @SKUVal FLOAT
DECLARE @Total int

SELECT DISTINCT SKU, 
                'DFLT' AS Branch,
                0 AS PriceList,
                NULL AS Campaign
INTO #SKU
FROM eCommerce_Pricing
WHERE PriceList > 0.0

SELECT @Total = COUNT(*) FROM #SKU

DECLARE Cur1 CURSOR LOCAL SCROLL DYNAMIC READ_ONLY
FOR
SELECT  SKU
FROM #SKU

OPEN Cur1
    FETCH NEXT FROM Cur1 INTO
    @SKU

    WHILE @@FETCH_STATUS = 0
    BEGIN

        SET @c = (SELECT COUNT(*) FROM eCommerce_Pricing WHERE SKU = @SKU)

        SELECT @SKUVal =  AVG(1.0 * PriceList)
        FROM (
            SELECT PriceList FROM dbo.eCommerce_Pricing
            WHERE SKU = @SKU
             ORDER BY PriceList
             OFFSET (@c - 1) / 2 ROWS
             FETCH NEXT 1 + (1 - @c % 2) ROWS ONLY
        ) AS x

        UPDATE #SKU
        SET PriceList = @SKUVal
        WHERE SKU = @SKU

        SET @Total = @Total-1
        PRINT CAST(@Total AS VARCHAR(100)) + ' records left'

    FETCH NEXT FROM Cur1 INTO
            @SKU
            END
    DEALLOCATE Cur1

SELECT * FROM #SKU
DROP TABLE #SKU

Any advise would be greatly appreciated. I am using SQL Sever 2014.


Solution

  • If you want the medians, use percentile_cont() or percentile_disc(). From what I can guess from your question, you want something like this:

    select distinct sku,
           percentile_disc(0.5) within group (order by price) over (partition by sku) as median
    from dbo.eCommerce_Pricing;
    

    Hopefully, you have learned that cursors are painfully slow and not the best approach to almost any problem in SQL (well, they do have their uses, but not for any functionality already in the database).