Search code examples
gitgit-addgit-plumbing

Which plumbing commands achieve the same as git add?


I'd like to understand better by learning what actually happens when entering

git add $DIRECTORY

and

git add $FILE

How does it work?

A rough idea can be gained by reading the progit's git internals section.

  • If $DIRECTORY is a directory, something like find $DIRECTORY -type f -exec git add {} \;, i.e. recursively adding all files in $DIRECTORY. Then, git add $FILENAME applies for each file.
  • A check against .gitignore (and its "superiors")
  • A check against .gitattributes, running a clean filter if applicable
  • git hash-object -w the cleaned contents

And then, the index gets updates somehow, which involves git mktree. But what exactly happens there? Does the tree for a directory contain only the files added or all files that were previously committed as well? And what happens next?


Solution

  • git add does not have a single equivalent plumbing command, but the closest one is probably git update-index. The ProGit description is correct:

    1. Replace each directory with a list of the directory's contents. The result is the list of files specified by the add, with some special case handling for files now known to be not in the directory (i.e., removed), and for files with special index states (--assume-unchanged and --skip-worktree). In other words, this step also consults the current index.

    2. Check for unstaged-but-ignored (via .gitignore) files and discard them from the list (with a warning) unless given -f / --force.

      (Side note: I have not tested this on subdirectories, and it's possible that -f won't apply to a subdirectory entry picked up by the recursive scan, but only to names actually given on the command line. If that's the case, step 2 must be combined with step 1, so that names don't get added if we're going to ignore them even with -f.)

    3. Apply attributes if any, making temporary cleaned copies of files if needed.

    4. Use git update-index --add --remove --replace to get modified files written to the repository, with their index entries updated, including mode updates. (For files that are cleaned in step 3, you would have to use a separate git hash-object -w, as you suggested, and --index-info instead of --add --remove --replace.)

    The git mktree command does not enter into this process at all, as the index itself is simply a flat file using a poorly-documented format (or more precisely, one of several formats; see --index-version).

    The index allows up to four entries per file name, called stages: stage 0 is a normal cache entry, and stages 1 through 3 are for conflicted merges. There are several special bits for marking files removed, or --assume-unchanged, --skip-worktree, --intent-to-add, and some special internal use flags, and—even though Git does not store directories—there are index entries for directories (which let Git look at the ctime field of the directory, which then lets Git skip unmodified directories quickly, provided it can trust the OS to maintain this).

    The git mktree command only comes into play when converting an index into a series of tree objects. Git must make one tree for each subdirectory in the index, plus one top level tree representing the overall index. (Subprojects, if any exist, are already in the index as "gitlink" entries, which is how they appear within whatever tree contains them.)