Search code examples
sqlsql-servergroup-bycuberollup

SQL Server - Grouping Combination of possibilities by fixed value


enter image description here

I have to create cheapest basket which inculde fixed items.

For example for a basket which have (5) items

1 and 4 = (1 * 50) + (1 * 100) = 150

2 and 3 = (1 * 60) + (1 * 80) = 140 -- this is my guy

2 and 2 and 1 = (1 * 60) + (1 * 60) + (1 * 50) = 170

3 and 3 = (1 * 80) + (1 * 80) = 160 **** this 6 items but total item can exceed min items. The important thing is total cost... ....

Also this is valid for any number of items a basket may have. Also there are lots of stores and each stores have different package may include several items.

How can handle this issue with SQL?

UPDATE

Here is example data generation code. Recursive CTE solutions are more expensive. I should finish the job under 500-600ms over 600-700 stores each time. this is a package search engine. Manual scenario creation by using ´#temp´ tables or ´UNUION´ is 15-20 times cheaper then Recursive CTE.

Also concatenating Item or PackageId is very expensive. I can found required package id or item after selecting cheapest package with join to source table.

I am expecting a megical solution which can be ultra fast and get the correct option. Only cheapest basket required for each store. Manual scenario creation is very fast but sometimes fail for correct cheapest basket.

                CREATE TABLE #storePackages(
                StoreId int not null,
                PackageId int not null,
                ItemType int not null, -- there are tree item type 0 is normal item, 1 is item has discount 2 is free item
                ItemCount int not null,
                ItemPrice decimal(18,8) not null,
                MaxItemQouta int not null, -- in generaly a package can have between 1 and 6 qouata but in rare can up to 20-25
                MaxFullQouta int not null -- sometimes a package can have additional free or discount item qouta. MaxFullQouta will always greater then MaxItemQouta
            )

            declare @totalStores int
            set @totalStores = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 200 AND 400 ORDER BY NEWID())

            declare @storeId int;
            declare @packageId  int;
            declare @maxPackageForStore int;
            declare @itemMinPrice decimal(18,8);
            set @storeId = 1;
            set @packageId = 1

            while(@storeId <= @totalStores)
                BEGIN
                    set @maxPackageForStore = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 2 AND 6 ORDER BY NEWID())
                    set @itemMinPrice = (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 40 AND 100 ORDER BY NEWID())
                        BEGIN
                            INSERT INTO #storePackages
                            SELECT DISTINCT 
                             StoreId = @storeId
                            ,PackageId = CAST(@packageId + number AS int) 
                            ,ItemType = 0
                            ,ItemCount = number
                            ,ItemPrice = @itemMinPrice + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2  ORDER BY NEWID()))
                            ,MaxItemQouta = @maxPackageForStore
                            ,MaxFullQouta =  @maxPackageForStore + (CASE WHEN number > 1 AND number < 4 THEN 1 ELSE 0 END)
                            FROM master..[spt_values] pkgNo
                            WHERE number BETWEEN 1 AND @maxPackageForStore
                            UNION ALL
                            SELECT DISTINCT 
                             StoreId = @storeId
                            ,PackageId = CAST(@packageId + number AS int) 
                            ,ItemType = 1
                            ,ItemCount = 1
                            ,ItemPrice = (@itemMinPrice / 2) + (10 * (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN pkgNo.number AND pkgNo.number + 2  ORDER BY NEWID()))
                            ,MaxItemQouta = @maxPackageForStore
                            ,MaxFullQouta =  @maxPackageForStore  + (SELECT TOP 1 n = number FROM master..[spt_values] WHERE number BETWEEN 0 AND 2 ORDER BY NEWID())
                            FROM master..[spt_values] pkgNo
                            WHERE number BETWEEN 2 AND (CASE WHEN @maxPackageForStore > 4 THEN 4 ELSE @maxPackageForStore END)


                        set @packageId = @packageId + @maxPackageForStore;
                        END
                set @storeId =@storeId + 1;
                END

            SELECT * FROM #storePackages
            drop table #storePackages

MY SOLUTION

First of all I am thankful for everyone who try to help me. However all suggested solutions are based on CTE. As I said before recursive CTEs cause performace problems when hunderds of stores are considered. Also multiple packages are requested for one time. This means, I request can include mutiple baskets. One is 5 items other is 3 items and another one is 7 items...

Last Solution

First of all I generates all possible scenarios in a table by item size... By this way, I have option eleminate unwanted scenarios.

CREATE TABLE ItemScenarios(
   Item int,
   ScenarioId int,
   CalculatedItem int  --this will be joined with Store Item
)

Then I generated all possible scenario from 2 item to 25 item and insert to the ItemScenarios table. Scenarios can be genereated one time by using WHILE or recursive CTE. The advantage of this way, scenarios generated only for one time.

Resuls are like below.

Item          |   ScenarioId       |     CalculatedItem
--------------------------------------------------------
2                   1                     2
2                   2                     3
2                   3                     1
2                   3                     1
3                   4                     5
3                   5                     4
3                   6                     3
3                   7                     2
3                   7                     2
3                   8                     2
3                   8                     1
3                   9                     1
3                   9                     1
3                   9                     1
....
.....
......
25                  993                   10

By this way, I can restrict scenario sizes, Max different store, max different package etc.

Also I can eleminate some scenarios which matematically impossible cheapest then other. For example for 4 items request, some scenario

Scenario 1 : 2+2

Scenario 2: 2+1+1

Scenario 3: 1+1+1+1

Among these scenarios; It is impossible Scenario 2 would be cheapest basket. Because,

If Scenario 2 < Scenario 3 --> Scenario 1 would be lower then Scenario 2. Because the thing decreasing cost is 2 item price and **Scenario 1* have double 2 items

Also If Scenario 2 < Scenario 1 --> Scenario 3 would be lower then Scenario 2

Now, If I delete scenarios like Scenario 2 I would gain some performance advantages.

Now I can chose chepest item prices among stores

DECLARE @requestedItems int;
SET @requestedItems = 5;

CREATE TABLE #JoinedPackageItemWithScenarios(
   StoreId int not null,
   PackageId int not null,
   ItemCount int not null,
   ItemPrice decimal(18,8) 
   ScenarioId int not null,
)
INSERT INTO #JoinedPackageItemWithScenarios
 SELECT
    SPM.StoreId  
   ,SPM.PackageId 
   ,SPM.ItemCount 
   ,SPM.ItemPrice 
   ,SPM.ScenarioId
   FROM (
      SELECT 
            SP.StoreId  
           ,SP.PackageId 
           ,SP.ItemCount 
           ,SP.ItemPrice 
           ,SC.ScenarioId
           ,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice) 
      FROM ItemScenarios SC
      LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
      WHERE SC.Item = @requestedItems
 ) SPM
 WHERE SPM.RowNumber = 1

-- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET

 CREATE TABLE #selectedScenarios(
   StoreId int not null,
   ScenarioId int not null,
   TotalItem int not null,
   TotalCost decimal(18,8) 
)
 INSERT INTO #selectedScenarios
 SELECT 
      StoreId
     ,ScenarioId
     ,TotalItem 
     ,TotalCost 
  FROM (
     SELECT 
           StoreId
          ,ScenarioId
          --,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
          ,TotalItem = SUM(ItemCount)
          ,TotalCost = SUM(ItemPrice)
          ,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
        FROM #JoinedPackageItemWithScenarios JPS
        GROUP BY StoreId,ScenarioId
        HAVING(SUM(ItemCount) >= @requestedItems)
     ) SLECTED
     WHERE RowNumber = 1

  -- NOW WE CAN POPULATE PackageIds if needed

  SELECT 
      SS.StoreId
     ,SS.ScenarioId
     ,TotalItem = MAX(SS.TotalItem)
     ,TotalCost = MAX(SS.TotalCost)
     ,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
    FROM #selectedScenarios SS
    JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
    GROUP BY SS.StoreId,SS.ScenarioId

SUM

In my test, this way is mimimum 10 times faster then recursive CTE, especially when number of stores and requested items increased. Also It gets 100% correct results. Because recursive CTE tried milions of unrequired JOINs when number of stores and requested items increased.


Solution

  • MY SOLUTION

    First of all I am thankful for everyone who try to help me. However all suggested solutions are based on CTE. As I said before recursive CTEs cause performace problems when hunderds of stores are considered. Also multiple packages are requested for one time. This means, A request can include mutiple baskets. One is 5 items other is 3 items and another one is 7 items...

    Last Solution

    First of all I generates all possible scenarios in a table by item size... By this way, I have option eleminate unwanted scenarios.

    CREATE TABLE ItemScenarios(
       Item int,
       ScenarioId int,
       CalculatedItem int  --this will be joined with Store Item
    )
    

    Then I generated all possible scenario from 2 item to 25 item and insert to the ItemScenarios table. Scenarios can be genereated one time by using WHILE or recursive CTE. The advantage of this way, scenarios generated only for one time.

    Resuls are like below.

    Item          |   ScenarioId       |     CalculatedItem
    --------------------------------------------------------
    2                   1                     2
    2                   2                     3
    2                   3                     1
    2                   3                     1
    3                   4                     5
    3                   5                     4
    3                   6                     3
    3                   7                     2
    3                   7                     2
    3                   8                     2
    3                   8                     1
    3                   9                     1
    3                   9                     1
    3                   9                     1
    ....
    .....
    ......
    25                  993                   10
    

    By this way, I can restrict scenario sizes, Max different store, max different package etc.

    Also I can eleminate some scenarios which matematically impossible cheapest then other. For example for 4 items request, some scenario

    Scenario 1 : 2+2

    Scenario 2: 2+1+1

    Scenario 3: 1+1+1+1

    Among these scenarios; It is impossible Scenario 2 would be cheapest basket. Because,

    If Scenario 2 < Scenario 3 --> Scenario 1 would be lower then Scenario 2. Because the thing decreasing cost is 2 item price and **Scenario 1* have double 2 items

    Also If Scenario 2 < Scenario 1 --> Scenario 3 would be lower then Scenario 2

    Now, If I delete scenarios like Scenario 2 I would gain some performance advantages.

    Now I can chose chepest item prices among stores

    DECLARE @requestedItems int;
    SET @requestedItems = 5;
    
    CREATE TABLE #JoinedPackageItemWithScenarios(
       StoreId int not null,
       PackageId int not null,
       ItemCount int not null,
       ItemPrice decimal(18,8) 
       ScenarioId int not null,
    )
    INSERT INTO #JoinedPackageItemWithScenarios
     SELECT
        SPM.StoreId  
       ,SPM.PackageId 
       ,SPM.ItemCount 
       ,SPM.ItemPrice 
       ,SPM.ScenarioId
       FROM (
          SELECT 
                SP.StoreId  
               ,SP.PackageId 
               ,SP.ItemCount 
               ,SP.ItemPrice 
               ,SC.ScenarioId
               ,RowNumber = ROW_NUMBER() OVER (PARTITION BY SP.StoreId,SC.ScenarioId,SP.ItemCount ORDER BY SP.ItemPrice) 
          FROM ItemScenarios SC
          LEFT JOIN StorePackages AS SP ON SP.ItemCount = SC.CalculatedItem
          WHERE SC.Item = @requestedItems
     ) SPM
     WHERE SPM.RowNumber = 1
    
    -- NOW I HAVE CHEAPEST PRICE FOR EACH ITEM, I CAN CREATE BASKET
    
     CREATE TABLE #selectedScenarios(
       StoreId int not null,
       ScenarioId int not null,
       TotalItem int not null,
       TotalCost decimal(18,8) 
    )
     INSERT INTO #selectedScenarios
     SELECT 
          StoreId
         ,ScenarioId
         ,TotalItem 
         ,TotalCost 
      FROM (
         SELECT 
               StoreId
              ,ScenarioId
              --,PackageIds = dbo.GROUP_CONCAT(CAST(PackageId AS nvarchar(20))) -- CONCATENING PackageId decreasing performance here. We can joing seleceted scenarios with #JoinedPackageItemWithScenarios after selection complated.
              ,TotalItem = SUM(ItemCount)
              ,TotalCost = SUM(ItemPrice)
              ,RowNumber = ROW_NUMBER() OVER (PARTITION BY StoreId ORDER BY SUM(ItemPrice))
            FROM #JoinedPackageItemWithScenarios JPS
            GROUP BY StoreId,ScenarioId
            HAVING(SUM(ItemCount) >= @requestedItems)
         ) SLECTED
         WHERE RowNumber = 1
    
      -- NOW WE CAN POPULATE PackageIds if needed
    
      SELECT 
          SS.StoreId
         ,SS.ScenarioId
         ,TotalItem = MAX(SS.TotalItem)
         ,TotalCost = MAX(SS.TotalCost)
         ,PackageIds = dbo.GROUP_CONCAT(CAST(JPS.PackageId AS nvarchar(20)))
        FROM #selectedScenarios SS
        JOIN #JoinedPackageItemWithScenarios AS JPS ON JPS.StoreId = SS.StoreId AND JPS.ScenarioId = SS.ScenarioId
        GROUP BY SS.StoreId,SS.ScenarioId
    

    SUM

    In my test, this way is mimimum 10 times faster then recursive CTE, especially when number of stores and requested items increased. Also It gets 100% correct results. Because recursive CTE tried milions of unrequired JOINs when number of stores and requested items increased.