Search code examples
gitopen-sourcereleasecode-analysisrevision-history

Automatically rewrite git history for open source release


I'm currently releasing several projects as open source. Typically the complete source is provided as ZIP archive or checked in at an open source repo. This makes analysis by ohloh difficult.

In case the software has been developed in a non-public repository, the complete history is available. However, I do not want to have the full history released.

I want to use git for reaching one of the two possibilites:

(i) One commit per author: There should be one commit per author (with the commit date the final release date). Each commit contains the lines of code, which finally made it into the final version.

(ii) Original commits with only the final code lines: In this variant, the number of commits itself are preserved. Each commit is modified in a way that only the lines, which finally made it into the final version, are preserved and all other ones are deleted.

Has anyone implemented one of the variants yet? Variant (i) seems to be doable using git-blame and some scripting.


Solution

  • git-oss-releaser is a solution for option (i).

    git-oss-releaser converts a given git repository to a git repository only containing the files of the last commit and commits resembling git blame output for each file. The original history is lost.

    usage: git-oss-releaser.py [-h] repoDir outDir

    Positional arguments:

    • repoDir: The repository to transform. May also be a subdirectory of a repo.
    • outDir: The directory where the new repo should be created. Has to be empty.

    Optional arguments:

    • --name NAME The user.name to use for committing the files. Defaults to git's global user.name.
    • --email EMAIL The user.email to use for committing the files. Defaults to git's global user.email.
    • --date DATE The date to use for commits. Defaults to the date the last commit.

    Note that git distinguishes author and committer at a commit. The author is taken using git blame, the committer data is taken from the global user.name and user.emailor the given configured --name and --email.

    DEBUG mode can currently only be enabled in the code.

    Limitations

    • Works on git repositories without any untracked files only
    • Empty lines are assigned to "git-oss-releaser" and not the first or last author adding these empty lines
    • Repository has to contain at least one non-binary file
    • Commit date is derived from non-binary files only
    • Tested under git for windows only