Search code examples
sqlsql-servert-sqlwindow-functions

Running sum excluding rows with duplicate column value


An example table:

video encoding video time spent encoding bytes encoding bytes running sum video time spent running sum (expected) video time spent running sum (actual)
A 1 1 500 500 1 1
A 2 1 400 900 1 2
B 3 2 300 1200 3 5
B 4 2 200 1400 3 8
B 5 2 100 1500 3 11
B 6 2 100 1600 3 14
  • video time spent column has how much time a video is watched; which encoding was watched is not important.
  • video time spent running sum is what I am trying to get. It should only sum time spent at video level, ignoring encodings.

I want to select as much encoding bytes as possible while staying under sum of video time spent < X.

My query so far:

SELECT *
FORM (
   SELECT 
      ...,
      SUM(encoding_bytes) OVER(ORDER BY encoding_bytes desc) AS encoding_bytes_running_sum, 
      SUM(video_time_spent) OVER (ORDER BY encoding_bytes desc) AS video_time_spent_running_sum
      ...
) 
WHERE video_time_spent_running_sum < X

but video_time_spent_running_sum isn't smart enough to skip over other encodings within the same video. What would be the best way to do this?

the number of encodings per video is not constant.

script to create table:

SELECT
    *,
    SUM(encoding_bytes) OVER(
        ORDER BY
            encoding_bytes DESC
    ) AS encoding_bytes_running_sum,
    SUM(video_time_spent) OVER (
        ORDER BY
            encoding_bytes DESC ROWS UNBOUNDED PRECEDING
    ) AS video_time_spent_running_sum
FROM (
    VALUES
        ('a', 1, 1, 500),
        ('a', 2, 1, 400),
        ('b', 3, 2, 300),
        ('b', 4, 2, 200),
        ('b', 5, 2, 100),
        ('b', 6, 2, 100)
) AS t (video, encoding, video_time_spent, encoding_bytes)

Solution

  • One way to do this is as follows (I am sure its possible to simplify); where you use the ROW_NUMBER function to only count the first row of each video.

    WITH cte AS (
        SELECT
            *
            , SUM(encoding_bytes) OVER (ORDER BY encoding_bytes DESC) AS encoding_bytes_running_sum
            --, SUM(video_time_spent) OVER (ORDER BY encoding_bytes DESC ROWS UNBOUNDED PRECEDING) AS video_time_spent_running_sum
            , ROW_NUMBER() OVER (PARTITION BY video ORDER BY video, [encoding]) rn
        FROM (
            VALUES
                ('a', 1, 1, 500),
                ('a', 2, 1, 400),
                ('b', 3, 2, 300),
                ('b', 4, 2, 200),
                ('b', 5, 2, 100),
                ('b', 6, 2, 100)
        ) AS t (video, [encoding], video_time_spent, encoding_bytes)
    )
    SELECT video, [encoding], video_time_spent, encoding_bytes, encoding_bytes_running_sum
        , SUM(CASE WHEN rn = 1 THEN video_time_spent ELSE 0 END) OVER (ORDER BY video ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) video_time_spent_running_sum
    FROM cte;
    

    This returns:

    video encoding video_time_spent encoding_bytes encoding_bytes_running_sum video_time_spent_running_sum
    a 1 1 500 500
    a 2 1 400 900
    b 3 2 300 1200
    b 4 2 200 1400
    b 5 2 100 1600
    b 6 2 100 1600