Suppose I have a small Git repository with three commits:
commit cccc:
updated smile.png (LFS)
updated manual.md
commit bbbb:
updated smile.png (LFS) <==== Don't want this specific one anymore!
added manual.md
commit aaaa:
added smile.png (LFS)
added README.md
added .gitattributes
I've added 3 different versions of the LFS file smile.png
, but I've determined that I don't want or need the middle one to exist in my repository anymore. I do not mind altering the git history. I also want to shrink the overall size of my repository.
I know that git filter-repo --path smile.png --invert-paths
can be used to completely remove all instances of and references to smile.png
. But, is there a way to remove the specific version from commit bbbb
while keeping the versions from aaaa
and cccc
?
The use of Git-LFS adds a small wrinkle to what is otherwise pretty simple.
You can "remove" commit bbbb
. To do so, you must also "remove" commit cccc
. I put "remove" in quotes here because Git doesn't actually remove commits. It just shoves them off to the side. They remain in your repository for some time, so that you can get them back if you decide that "removing" them was a mistake.
How long they remain—and why—is a somewhat complicated affair, but the default is to retain deleted commits for a minimum of 30 days. Meanwhile, the reason that you must remove cccc
when removing bbbb
is simple enough: each commit depends on the existence of all previous commits. So you can't just rip one out of the middle of a chain. You have to rip that one out and all subsequent commits.
What this means is that to keep the contents of commit cccc
, you'll need to make a new-and-improved version of cccc
. The newness of the replacement is automatic: no existing commit can ever be changed but new commits can always be added. The improved-ness of the commit is that it contains the snapshot you want—however you choose to arrange for this—and that it links back to commit aaaa
. So, when looking at commits, Git will now start at the last commit cccd
(or whatever its hash ID is) and see that one, then move back to aaaa
and see that one, and you'll see the history you like.
Both git filter-branch
and git filter-repo
can do this kind of surgery easily. There are other ways to do the same surgery; in this particular case, with just one commit to copy, we could do it with git commit-tree
(to make the new and improved cccd
) and git reset
(to move the branch name to find cccd
), for instance. See any of the many StackOverflow questions about editing history for the many options (git replace
, the commit-tree method, The BFG, filter-branch, filter-repo, etc) here.
Here's what to know about the use of Git-LFS: When you add and commit a "large file" in Git-LFS, the LFS software has secretly replaced your file with an "LFS pointer file" (which is tiny: typically well under 1 KiB). This means that Git doesn't store your file at all. Git stores instead this LFS-pointer-file. The LFS code has already stored your file somewhere else (on some other web site),1 and uses the pointer file to find the stored file. When you have Git check out some particular commit, the Git-LFS software intercepts the checkout, notices that some files have been secretly replaced with pointers, goes to the LFS web site to retrieve the large file.
When you do your history rewrite, you'll make a new commit cccd
that has the exact same content as cccc
. That's good because the pointer file in cccd
will be the one from cccc
. So the LFS interceptor will replace it with the same larger file. But: commit bbbb
contained a pointer-file to some file stored on the other web site, where the large files are kept. This other web site has no idea that you'll never, ever refer to commit bbbb
again.2 So they are going to keep the large file.
If you want them to get rid of the for-bbbb
version of the large file, you'll need some other mechanism—one that is entirely outside Git itself—to get rid of it. That's not something that any part of Git will do. Note that if you're using GitHub specifically, you may have some issues here: How to delete a file tracked by git-lfs and release the storage quota?
1This "separate web site" could be the main hosting provider web site, or a secondary site, or completely separate from some hosting-provider web site. The details are up to you and your LFS configuration.
2Assuming, that is, that you don't change your mind and restore commit bbbb
.