I face a problem that could be not very difficult in a first approach but it's not as easy as it could be. I have a list of files with their sizes in Mbytes that a list from a DB table.
Example:
SELECT file_name, file_size
FROM myfiles
Result:
file_name | file_size
----------------------
file 1 | 14
file 2 | 5
file 3 | 20
file 4 | 6
I want to implement a SQL query that group files by packet to send to an API. Files should be regrouped by packet of 20 Mb maximum. The best solution would be to return the "sending index" in a third column. Something like this:
file_name | file_size | packet_index
-------------------------------------
file 1 | 14 | 1
file 2 | 5 | 1
file 3 | 20 | 2
file 4 | 6 | 3
The first send will contain file 1 & 2, the second file 3 and the last one file 4. How can I determine this information in SQL please ?
Unfortunately, this type of problem requires a recursive CTE. You are assigning groups in order, so:
with tt as (
select t.*, row_number() over (order by file_name) as seqnum
from myfiles t
),
cte as (
select file_name, file_size, file_size as total, seqnum, 1 as grp
from tt
where tt.seqnum = 1
union all
select tt.filename, tt.filesize,
(case when tt.filesize + cte.total > 20
then tt.filesize
else tt.filesize + cte.total
end),
tt.seqnum,
(case when tt.filesize + cte.total > 20
then cte.grp + 1
else cte.grp
end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select *
from cte;
Here is a db<>fiddle.
If you have more than 100 rows, then add OPTION (MAXRECURSION 0)
.