Search code examples
pythonfile-iosplitdirectoryfilesize

Split set of files based on size in MB with Python


Is there any way to have a function in Python that can walk a folder with a list of files & separate the list into "partitions" (which will become folders) based on total size of the files in each partition/folder in megabytes? I'm not sure how to start with this or what to do first.


Solution

  • Assuming you want a starting point, not a solution in a can:

    • Use os.walk to scan a whole directory tree. If you only need to scan one folder, not a whole tree, you can optimize a bit without sacrificing simplicity (particularly on Windows) on Python 3.5 with the new os.scandir function that will give you stat info for free on Windows (and make it accessible as a lazily cached value on *NIX systems). On earlier versions of Python, a third party scandir module on PyPI provides the same interface.
    • If not using os.scandir, you'd use os.stat to get file sizes
    • Use a collections.defaultdict(set) to map from file sizes in MB to a set of files that round to that size (or just process the files as you go instead of storing in a container at all). Alternatively, sort with sorted key-ed on the size and use itertools.groupby (with whatever MB granularity you like) to group the resulting files.