Suppose I have a giant repo for an as-of-yet unpublished software product called "Hammerstein", written by the famous German software company "Apfel" of which I am an employee.
One day, "Apfel" spins out the Hammerstein division and sells it to the even more famous company "Oráculo" which renames "Hammerstein" to "Reineta" as a matter of national pride and decides to open source it.
Agreements mandate that all references to "Hammerstein" and "Apfel" be replaced by "Oráculo" and "Reineta" in the repository.
All filenames, all commit messages, everything must be replaced.
So, for example:
src/core/ApfelCore/main.cpp
must become src/core/OraculoCore/main.cpp
.
The commit message that says "Add support for Apfel Groupware Server"
must become "Add support for Oraculo Groupware Server"
The strings ApfelServerInstance* local_apfel
, #define REINETA
and Url("http://apfel.de")
must become OraculoServerInstance* local_oraculo
, #define HAMMERSTEIN
, etc.
This applies to files that are not in HEAD
anymore as well.
What is the simplest and most pain-free method to achieve it with minimal manual intervention (so that it can be applied in batch to a potentially large number of repositories/assets)?
--delete-file
option, not a --rename-file
, and even then it does not take patterns as an argumentHEAD
and not for the whole history; I have had no luck using it with --tree-filter
Full disclosure: I'm the author of the BFG Repo-Cleaner
As you say in the question, the BFG supports replacing file content with the --replace-text
flag - but this flag does not extend to file names and commit messages. So, what alterations to the codebase would it take to make the BFG's --replace-text
operation extend to those too?
This comes down to hooking in some new Cleaner[V]
implementations, where V
is the type of thing you want to clean (a commit message, a directory listing), and the Cleaner
just has the job of producing a new, clean V
from an old, dirty V
. To perform the actual text change, you can re-use the same text-replacing function used for file content changes.
Use a Cleaner[Seq[Tree.Entry]]
- 'tree' is what Git calls folders ('file tree') - so you would just update the FileName
on each Tree.Entry
.
Use a Cleaner[CommitNode]
- again, you're just replacing text on the message
field - see the CommitMessageObjectIdsUpdater for a very close example for what you're trying to do. While you're there, you could do something with the author and committer email addresses if you wanted to (eg purge ...@apfel.com
, I guess).
As mentioned by @VonC in his answer, filter-branch
can do both these replacements (file name & commit message) but while the --msg-filter
flag should do the commit message updates reasonably quickly, I believe filter-branch
will be fairly excruciatingly slow for renaming files within a large code base like yours. The BFG, optimised for exactly this kind of operation, will be several hundred times faster.
The BFG accepts donations at https://www.bountysource.com/teams/bfg-repo-cleaner - so if you'd like to support development of this feature, or if you just found the BFG useful in solving your problem, that's where you can make a difference.