I have around 4,000 files of wildly different sizes that I am trying to back up as efficiently as is reasonably possible. I know compressing them all into a giant tarball and splitting evenly is a solution, but as I am using Bluray discs, if I scratch one section, I risk losing the whole disc's contents.
I wrote a python script to put all the files (coupled with their sizes) into an array. I take the biggest file first, and either add the next biggest (if that total is still less than 25GB) or move down the list until there is one I can add that will, until I hit the size limit, then start over with the next biggest remaining file.
This works reasonably well, but it gets really ragged at the end and I will end up using 15 more discs than is mathematically theoretically required.
Anyone have a better method I'm not aware of? (This seems like a Google coding interview question lol). I don't need it to be perfect, I just want to make sure I'm not doing this stupidly before I run through this giant stack of non-cheap BD-Rs. I've included by code for reference.
#!/usr/bin/env python3
import os
import sys
# Max size per disc
pmax = 25000000000
# Walk dir
walkdir = os.path.realpath(sys.argv[1])
flist = []
for root, directories, filenames in os.walk( walkdir ):
for filename in filenames:
f = os.path.join(root,filename)
fsize = os.path.getsize(f)
flist.append((fsize,f))
flist.sort()
flist.reverse()
running_total = 0
running_list = []
groups = []
while flist :
for pair in flist :
if running_total + pair[0] < pmax :
running_list.append(pair[1])
running_total = running_total + pair[0]
flist.remove(pair)
groups.append(l)
running_list = []
running_total = 0
print('This will take {} discs.'.format(len(groups)))
I just brute forced it by going down the list, adding smaller and smaller files until I was out of files or the disk filled up, then repeat. By "mathematically required" I just meant size of all files / 25GB = ideal # of discs. I can post the resulting arrays on