Search code examples
mercurialmercurial-convert

Finding Large Files in Mercurial Repository


Similar to this link but for mercurial. I'd like to find the files that are most contributing to the size of my mercurial repository.

I intend to use hg convert to create a new, smaller repository. I'm just not sure yet which files are contributing to the repository size. They could be files that have already been deleted.

What is a good way to find these anywhere in the repository history? There are over 20,000 commits. I'm thinking a powershell script, but I'm not sure what the best way to go about this is.


Solution

  • Check hg help fileset. Something like

    hg files "set:size('>1M')"
    

    should do the trick for you. You might need to operate over all revisions, though as it only operates on one revision. In bash I'd try something like

    for i in `hg log -r"all()" "set:size('>400k')" --template="{rev}\n"`; do hg files -r$i "set:size('>400k')"; done | sort | uniq
    

    might do the trick. Maybe it can be optimized as it's currently a bit duplication and might run for quite a bit; on the OpenTTD repository with 22000 commits it took on my laptop just short of 10 minutes.

    (Also check hg help on templates, files and grep)