Search code examples
gitreplacerepositorygit-rewrite-historybfg-repo-cleaner

How to batch-replace *all* instances (content, filenames and commit messages) of *Foo* to *Bar* in a repo in a single, simple step?


Suppose I have a giant repo for an as-of-yet unpublished software product called "Hammerstein", written by the famous German software company "Apfel" of which I am an employee.

One day, "Apfel" spins out the Hammerstein division and sells it to the even more famous company "Oráculo" which renames "Hammerstein" to "Reineta" as a matter of national pride and decides to open source it.

Agreements mandate that all references to "Hammerstein" and "Apfel" be replaced by "Oráculo" and "Reineta" in the repository.

All filenames, all commit messages, everything must be replaced.

So, for example:

  1. src/core/ApfelCore/main.cpp must become src/core/OraculoCore/main.cpp.

  2. The commit message that says "Add support for Apfel Groupware Server" must become "Add support for Oraculo Groupware Server"

  3. The strings ApfelServerInstance* local_apfel, #define REINETA and Url("http://apfel.de") must become OraculoServerInstance* local_oraculo, #define HAMMERSTEIN, etc.

This applies to files that are not in HEAD anymore as well.

What is the simplest and most pain-free method to achieve it with minimal manual intervention (so that it can be applied in batch to a potentially large number of repositories/assets)?

  1. BFG can replace the strings, but it seems to only have a --delete-file option, not a --rename-file, and even then it does not take patterns as an argument
  2. This approach seems to work only for HEAD and not for the whole history; I have had no luck using it with --tree-filter

Solution

  • Full disclosure: I'm the author of the BFG Repo-Cleaner

    As you say in the question, the BFG supports replacing file content with the --replace-text flag - but this flag does not extend to file names and commit messages. So, what alterations to the codebase would it take to make the BFG's --replace-text operation extend to those too?

    This comes down to hooking in some new Cleaner[V] implementations, where V is the type of thing you want to clean (a commit message, a directory listing), and the Cleaner just has the job of producing a new, clean V from an old, dirty V. To perform the actual text change, you can re-use the same text-replacing function used for file content changes.

    File Names

    Use a Cleaner[Seq[Tree.Entry]] - 'tree' is what Git calls folders ('file tree') - so you would just update the FileName on each Tree.Entry.

    Commit Messages

    Use a Cleaner[CommitNode] - again, you're just replacing text on the message field - see the CommitMessageObjectIdsUpdater for a very close example for what you're trying to do. While you're there, you could do something with the author and committer email addresses if you wanted to (eg purge [email protected], I guess).

    Speed

    As mentioned by @VonC in his answer, filter-branch can do both these replacements (file name & commit message) but while the --msg-filter flag should do the commit message updates reasonably quickly, I believe filter-branch will be fairly excruciatingly slow for renaming files within a large code base like yours. The BFG, optimised for exactly this kind of operation, will be several hundred times faster.

    The BFG accepts donations at https://www.bountysource.com/teams/bfg-repo-cleaner - so if you'd like to support development of this feature, or if you just found the BFG useful in solving your problem, that's where you can make a difference.