Search code examples
sqlt-sql

Distribute sequential SQL results evenly based on count


I have SQL results that I need to break into item ranges and the count distributed evenly across a number of tasks. What is a good way to do this?

My data looks like this.

+------+-------+----------+
| Item | Count | ItmGroup |
+------+-------+----------+
| 1A   |   100 |        1 |
| 1B   |    25 |        1 |
| 1C   |     2 |        1 |
| 1D   |     6 |        1 |
| 2A   |    88 |        2 |
| 2B   |    10 |        2 |
| 2C   |   122 |        2 |
| 2D   |    12 |        2 |
| 3A   |     4 |        3 |
| 3B   |   103 |        3 |
| 3C   |     1 |        3 |
| 3D   |    22 |        3 |
| 4A   |    55 |        4 |
| 4B   |    42 |        4 |
| 4C   |   100 |        4 |
| 4D   |     1 |        4 |
+------+-------+----------+

Item = the item code. Count = this context it is determining the popularity of the item. This can be used to RANK items if need be. ItmGroup - this is a parent value for the Itm column. Item is contained in a Group.

What differentiates this from other similar questions I'veviewed is that the ranges I need to determine cannot be taken out of the order they show in this table. We can do Item Range from A1 to B3, in other words, they can cross over ItmGroups, but they must remain in alphanumeric order by Item.

The expected result would be item ranges that evenly distribute the total count.

+------+-------+----------+
| FrItem | ToItem | TotCount|
+------+-------+----------+
| 1A   |   2D  |      134 |
| 3A   |   3D  |      130 |
(etc)

Solution

  • Provided you've happy with a rough estimate, this will split the data in to two groups.

    The first group will always have as many records as possible, but no more than half of the total count (and group 2 will have the rest).

    WITH
      cumulative AS
    (
      SELECT
        *,
        SUM([Count]) OVER (ORDER BY Item)   AS cumulativeCount,
        SUM([Count]) OVER ()                AS totalCount
      FROM
        yourData
    )
    SELECT
      MIN(item)    AS frItem,
      MAX(item)    AS toItem,
      SUM([Count]) AS TotCount
    FROM
      cumulative
    GROUP BY
      CASE WHEN cumulativeCount <= totalCount / 2 THEN 0 ELSE 1 END
    ORDER BY
      CASE WHEN cumulativeCount <= totalCount / 2 THEN 0 ELSE 1 END
    

    To split the data in to 5 portions, it's similar...

    GROUP BY
      CASE WHEN cumulativeCount <= totalCount * 1/5 THEN 0
           WHEN cumulativeCount <= totalCount * 2/5 THEN 1
           WHEN cumulativeCount <= totalCount * 3/5 THEN 2
           WHEN cumulativeCount <= totalCount * 4/5 THEN 3
                                                    ELSE 4 END
    

    Depending on your data this isn't necessarily ideal

     Item | Count       GroupAsDefinedAbove   IdealGroup
    ------+-------
      1A  |   4              1                  1
      2A  |   5              2                  1
      3A  |   8              2                  2
    

    If you want something that can get the two groups as close in size as possible, that's a lot more complex.