Search code examples
gitauthorgit-guiattribution

Why does Git GUI keep calling me Jon Skeet?


As a joke, I named my laptop's user account Jon Skeet. I've configured my per-repository options to call me wizzwizz4, yet when I view my commits I see this:

Author: Jon Skeet <jon@myLaptop>  2018-12-21 22:07:11
Committer: wizzwizz4 <[email protected]>  2018-12-21 22:12:07
Parent: 39c31f5aebe43cdddbe00432207e4bb2cc6a777e (Initial commit)
Branches: master
Follows: 
Precedes: 

Why does it keep doing this when my repository settings clearly state my intentions? I don't want Jon getting credit for my code! Committing from the command-line has the expected result.


Solution

  • There are multiple ways that can occur. Before we get to them all, we will need to cover appropriate background information. It's also worth emphasizing a key item: No data in any existing commit can be altered, not even a single bit. As long as commit 39c31f5aebe43cdddbe00432207e4bb2cc6a777e exists in your repository, it will continue to have the same information. (Note that this is the parent of the commit you showed. You did not show the actual hash ID of the commit itself, so I couldn't use that one.)

    In the very specific case of Git-GUI (git-gui.sh, which I never use), it looks from the source as though there is a feature where using "amend" reads the HEAD commit's author information and replicates it. It generally should do this when you select "amending" (which, as noted above and below, is a white lie) and should not do this when not-amending. Unlike command-line git commit, there appears to be no Git-GUI knob to do an amend while not retaining the author. If it's accidentally applying the author retaining to all new commits, that's just a bug.

    For more, read on.

    Background

    Every commit has some metadata associated with it. There are two relevant metadata lines in the raw commit object, called the author and the committer. These two are typically, but not necessarily, the same, as one can see from various commits in the Git repository for Git itself. For instance:

    $ git cat-file -p 5d826e972970a784bd7a7bdf587512510097b8c7
    tree c790c47fe551d5ed812cfefdac243eb972c1fde3
    parent b5796d9a3263b26a8ef32eeca76b3c1d62fcedc5
    author Junio C Hamano <gitster pobox.com> 1544328981 +0900
    committer Junio C Hamano <gitster pobox.com> 1544328981 +0900
    
    Git 2.20
    
    Signed-off-by: Junio C Hamano <gitster pobox.com>
    

    (I've replaced @ with to possibly cut down on spam harvesting). But:

    $ git cat-file -p 6fcbad87d476d7281832af843dd448c94673fbfc
    tree aa05bc7af6e92f3db5d5d738adf0d0b1b3dd23b6
    parent b00bf1c9a8dd5009d5102aef7af9e2b886b1e5ad
    author Johannes Sixt <j6t kdbg.org> 1543858489 +0100
    committer Junio C Hamano <gitster pobox.com> 1543891852 +0900
    
    rebase docs: fix incorrect format of [... snip]
    

    Note that there are actually three parts to each of the two fields: full name, email address in <angle brackets>, and timestamp-with-zone-offset.

    When you make a new commit using git commit, Git usually sets both author and committer to the same three strings. But many Git commands copy some existing commit to a new-and-improved replacement. The new commit has a new and different hash ID, by definition, but is meant to be used instead of the old one. For these cases, Git normally preserves the original author information and sets you (and now) as the committer.

    For reference, the author-preserving commit-copying commands are git commit with either of the --amend or the -c / -C options; git cherry-pick; and git rebase. The git am command is meant to turn emailed patches into commits: it takes something other than a commit as its input, so we could say it's author-preserving, but then we have to define what we mean by author. In this case, git am guesses at the authorship information by parsing a mailbox-formatted message.

    Mechanisms for each field

    There is a single underlying Git command, git commit-tree, that other commands either use, or have built into them. This actually builds the commit object, which contains the above metadata. It can take various directives to set each field individually. If some field is not set, git commit-tree can take a default value from somewhere.

    Since there are six parts—name, email-address, and time-stamp for each of author and committer—there are six places to get specific directives, and many places—not six this time!—to get defaults. First, though, let's enumerate the primary six.

    Rather than taking command-line options, git commit-tree takes these six items from environment variables, as described in the documentation:

    GIT_AUTHOR_NAME
    GIT_AUTHOR_EMAIL
    GIT_AUTHOR_DATE
    GIT_COMMITTER_NAME
    GIT_COMMITTER_EMAIL
    GIT_COMMITTER_DATE
    

    If you set any or all of these variables, that sets the value that will go into all subsequent new commits (until you unset the variables or your session with this environment expires, unsetting the variables).

    If not, well, the documentation goes on to say:

    In case (some of) these environment variables are not set, the information is taken from the configuration items user.name and user.email, or, if not present, the environment variable EMAIL, or, if that is not set, system user name and the hostname used for outgoing mail (taken from /etc/mailname and falling back to the fully qualified hostname when that file does not exist).

    This is a bit of a white lie, as the actual code path taken depends on compile-time options, so that different Git installations can have different customized defaults. But the general overall idea is correct: Git will use your user.name and user.email setting first, for both author and committer, if you have not overridden one or both with the various environment variables.

    The default timestamp, of course, is simply your own computer's idea of the current time. The relatively new user.useConfigOnly setting tells modern Git not to guess at user.name and/or user.email. In older versions of Git, Git did no guessing: if these were not set, git commit-tree and git commit would just fail with an error message, saying that it did not know who you are.

    The git commit front-end command also takes --author and --date as arguments. These arguments can specify the user name, email address, and/or time-stamp to use in the new commit; git commit effectively implements these by setting, for the duration of the commit operation, the GIT_AUTHOR_* variables.

    When using the git commit front end with the --amend flag—which, despite its name, does not actually change a commit; it just makes a new one to use instead of the current one, with all that this implies—the --reset-author flag tells the front end not to preserve the original commit's author information.

    Conclusion

    If new commits are getting the wrong author, while getting the right committer, one of two things must be the case:

    • You're using --author. Stop!

    • You have GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL set in your environment. Stop setting them!

    If you have some existing commit, and you're trying to replace with a new and improved commit via git commit --amend, but it is retaining its author setting, simply add --reset-author. Of course, this only works from the command line. If you're using something else, find out if it has a similar option.

    If some existing commit has the wrong author, you're stuck with that. You can copy the existing, not-so-good, commit to a new and improved commit, and try to convince everyone else who has a copy of this same repository—a clone—to pick up and use the new-and-improved commit instead of the old one. How hard that will be obviously depends on how stubborn the other users are, and also where the commit is.

    Commits that are at the tip of their branch, and are not on any other branch, are quite easy to swap out, using git commit --amend. Commits further back in history are more difficult: you can use interactive rebase, or git replace, or, in particularly ugly cases, git filter-branch, to swap them out (sometimes combining these techniques). Any "change" to an older commit necessarily ripples down through all of its descendants by design,1 so this kind of change can be quite disruptive. However, if it's "changing"—replacing, really—history that no one else has ever seen, it's safe enough.


    1The immediate children of a "bad" commit contain the bad commit's parent hash ID. Hence, to get the children to refer to the replacement, we must replace the children too. That means we must then replace their children, and so on down the line, all the way to the tip commits of each affected branch.