Search code examples
mercurialrepositoryfogbugzkilnmercurial-convert

Mercurial repository cleanup preserving Kiln/Fogbugz history


TL;DR Version: Is it possible to reorganize a Mercurial repo without breaking Kiln/Fogbuz history? Or do I have to start fresh?


I have a repository that is a real mess, in need of some serious cleanup, and am trying to figure out how best to do it. The goal is to remove a few files entirely -- they should not appear in any commits, ever -- move a few directories, and split one directory out into an entirely separate repository. I know, I know -- you're not supposed to be able to change history. In this case, however, it's either change history or start from scratch with new repositories.

The repository in question is managed in Mercurial, with the remote repository hosted in Kiln. Issues are tracked in Fogbugz. Thanks to some commit link-processing rules, any references in a commit message to an issue (case) number like Case 123 are converted to links to the Fogbugz case in question. In turn, the case that was mentioned has a note appended to it with the commit message.

Current Structure

The project file structure is currently something like this:

- /
    +- includes/
    |   +- functions-related-to-abc.php
    |   +- functions-related-to-xyz.php
    |   +- class-something.php
    |   +- classes-several-things.php
    |   +- random-file.php
    |   ...
    |
    +- development/
    |   +- a-plugin-folder/
    |   |   +- some-file.php
    |   |   +- file-with-sensitive-and-non-sensitive-info.php
    |   |   ...
    |   |
    |   +- some-backend-functions-related-to-coding.php
    |   ...
    |
    +- index.php
    +- test-config-file.php
    ...

Target Structure

The structure I want is something like this:

- /
    +- build/
    +- doc/
    +- src/
    |   +- functions/
    |   |   +- abc.php  // renamed from includes/functions-related-to-abc.php
    |   |   +- xyz.php  // renamed from includes/functions-related-to-xyz.php
    |   |   ...
    |   |
    |   +- classes/
    |   |   +- something.php       // renamed from includes/class-something.php
    |   |   +- several-things.php  // renamed from includes/classes-several-things.php
    |   |   ...
    |   |
    |   +- view/
    |   |   +- random-file.php  // formerly includes/random-file.php
    |   ...
    |
    |   +- development/
    |   |   +- some-backend-functions-related-to-coding.php
    |   |   ...
    |   +- index.php
    |   ...
    |
    +- test/
    ...

a-plugin-folder would move to its own, separate repository. test-config-file.php would no longer be tracked in the repository at all. Ideally, I will also do some minor pruning and renaming of branches while I'm at it.

In my dream world, file-with-sensitive-and-non-sensitive-info.php would somehow be tracked consistently, but with the sensitive info (a couple of passwords) yanked out into a config file that is not under version control. I realize that's probably wishful thinking.

My Current Thinking

My current thinking is that my wish list is basically impossible: I can create new, properly structured repositories from this point forward, but cannot preserve my change history and also make the radical structural changes I need to make. In this view, I should take the current code base, reorganize it all the way I want it, and commit it as changeset 1 for two new repositories (the root repository and the plugin repository). I would then just keep a copy of the old repository backed up somewhere for reference. Major downsides: (1) I lose all my history and (2) the Kiln and Fogbugz cross-references for historical commits are all toast.

My Question

So, here's the question: is there any way to do what I want -- restructure, pull a few files out, and get everything looking pretty -- without losing all of my history?

I have considered using the hg convert extension, making heavy use of the filemap, splicemap, and branchmap options. The problems I see with that approach include: (1) breaking all prior builds, (2) not having file-with-sensitive-and-non-sensitive-info.php in prior builds at all (or leaving it in, which defeats the point), and (3) rendering many of the commit messages wildly incorrect to the extent they refer to file names or repo structure. In other words, I'm not sure this option gains me much as opposed to just starting clean, properly structured repositories.

I have also considered the extreme option: writing a custom script of some sort to build a new repository by going through each existing commit, stripping sensitive information out of file-with-sensitive-and-non-sensitive-info.php, rewriting commit messages to the extent necessary, and committing the revised version of everything. This, theoretically, could solve all of my problems, but at the cost of reinventing the wheel and probably taking a ridiculous amount of time. I'm looking for something that isn't the equivalent of writing an entire hg extension.

EDIT: I am considering creating an empty repository, then writing a script that uses hg export and hg import to bring changesets over one at a time, making edits where necessary to strip sensitive information like passwords out of files. Is there a reason this wouldn't work?


Solution

  • I was able to accomplish my goals. Here's what I ended up doing:

    • First, I "flattened out" (straightened) the repository by eliminating all branches and merges and turning the repo into a single line of commits. I had to do this because hg histedit -- the key to the whole cleanup -- doesn't work on history containing merges. This was okay with me, because there were no really meaningful branches or merges in this particular repository and there is only one author in the relevant history. I probably could have retained the branches and merged again as necessary later, but this was easier for my purposes. To do this I used hg rebase and the MQ extension. (Special thanks to @tghw for this extremely helpful answer, which helped me understand for the first time how MQ really works.)

    • Next, I used hg convert to create several repositories from the original repository -- one for each library/plugin that I needed to put into its own repository and one main repository for the rest of the code. In the process, I used --filemap and --branchmap to reorganize everything as necessary.

    • Third, I used hg histedit on each new repository to (1) clean up irrelevant commit messages as needed and (2) remove sensitive information.

    • Fourth, I pushed all of the new repositories to Kiln, which automatically linked them to FogBugz cases using the same rules I had in place for the original repository (e.g., Case 123 in the commit message creates a link to FogBugz case # 123).

    • Finally, I "deleted" the original repository in Kiln. Kiln doesn't truly and permanently delete repositories as of right now, though I have proposed a use case for making that possible. Instead, it delinks FogBugz cases and puts the "deleted" repository into cold storage; an account administrator can restore it, but it is otherwise invisible.

    All told, it took about 10 hours to split the original repository into 6 pieces and clean each part thereof. Some of that was learning curve; I could probably do the whole thing in more like 6 hours if I had to do it again. A long day, but worth it for the dramatically improved repository structure and cleaned-up code.

    Everything is now as it should be. Hopefully, this will help other users. Please feel free to post a comment if you have a similar issue and would like additional insight from my experience.