Search code examples
gitcommitcommit-messagegit-filter-repo

git-filter-repo callback commit or callback message and --preserve-commit-hashes not working?


I'm trying to update commit messages but on same time to keep the same hash.

i try both options --message-callback and --commit-callback but no mater which one i choose, it generates new hashes. Here how i do that:

python3 git-filter-repo.py --preserve-commit-hashes --message-callback (or --commit-callback) '
if b"blabla" not in message:
    message = b"MyMessage " + message
return message' --force

Is this a kind of bug? Or i do something completely wrong?

Any help is appreciated


Solution

  • I'm trying to update commit messages but on same time to keep the same hash.

    This is not possible. The hash is a cryptographic checksum of the entire content of the commit. Changing a single bit in the message has the same radical effect on the checksum as changing a single bit in the timestamp: the new commit gets a new, unique hash ID. This is how other Git commands (on any computer) recognize that this is not the same commit as the original commit. If the hash didn't change, you would not be able to store the updated commit, nor send it to any other Git.

    This is a fundamental concept at the heart of the storage model for Git: that the hash ID is the object. The pigeonhole principle be damned, every bitstream must have its own unique hash ID. If you can break the hash function, you can break—or at least stymie—progress in the repository.

    (The --preserve-commit-hashes option makes the built in default message rewrite option not look for patterns that resemble commit hash IDs, look them up in the translation table that filter-repo generates, and use the result. This is the opposite of what you'd want with a repository in which there was extensive use of git cherry-pick -x, where each cherry-picked commit would say something about it being a cherry pick of previous commit H, for some hash. The filter-repo program tries to make sure the earlier commits are handled first, and then replace the outdated hash ID in the subsequent commits. I have no idea how well this works in practice: the goal is obvious but the details get pretty sticky. I'm not entirely sure why this option exists at all, but if you're doing a history rewrite and some things that resemble commit hashes, but aren't actually commit hashes, are being damaged, that would probably be why you would use this option.)