I'd like to understand git-plumbing better by learning what actually happens when entering
git add $DIRECTORY
and
git add $FILE
How does it work?
A rough idea can be gained by reading the progit's git internals section.
$DIRECTORY
is a directory, something like find $DIRECTORY -type f -exec git add {} \;
, i.e. recursively adding all files in $DIRECTORY
. Then, git add $FILENAME
applies for each file..gitignore
(and its "superiors").gitattributes
, running a clean
filter if applicablegit hash-object -w
the clean
ed contentsAnd then, the index gets updates somehow, which involves git mktree. But what exactly happens there? Does the tree for a directory contain only the files added or all files that were previously committed as well? And what happens next?
git add
does not have a single equivalent plumbing command, but the closest one is probably git update-index
. The ProGit description is correct:
Replace each directory with a list of the directory's contents. The result is the list of files specified by the add
, with some special case handling for files now known to be not in the directory (i.e., removed), and for files with special index states (--assume-unchanged
and --skip-worktree
). In other words, this step also consults the current index.
Check for unstaged-but-ignored (via .gitignore
) files and discard them from the list (with a warning) unless given -f
/ --force
.
(Side note: I have not tested this on subdirectories, and it's possible that -f
won't apply to a subdirectory entry picked up by the recursive scan, but only to names actually given on the command line. If that's the case, step 2 must be combined with step 1, so that names don't get added if we're going to ignore them even with -f
.)
Apply attributes if any, making temporary cleaned copies of files if needed.
Use git update-index --add --remove --replace
to get modified files written to the repository, with their index entries updated, including mode updates. (For files that are cleaned in step 3, you would have to use a separate git hash-object -w
, as you suggested, and --index-info
instead of --add --remove --replace
.)
The git mktree
command does not enter into this process at all, as the index itself is simply a flat file using a poorly-documented format (or more precisely, one of several formats; see --index-version
).
The index allows up to four entries per file name, called stages: stage 0 is a normal cache entry, and stages 1 through 3 are for conflicted merges. There are several special bits for marking files removed, or --assume-unchanged
, --skip-worktree
, --intent-to-add
, and some special internal use flags, and—even though Git does not store directories—there are index entries for directories (which let Git look at the ctime
field of the directory, which then lets Git skip unmodified directories quickly, provided it can trust the OS to maintain this).
The git mktree
command only comes into play when converting an index into a series of tree objects. Git must make one tree for each subdirectory in the index, plus one top level tree representing the overall index. (Subprojects, if any exist, are already in the index as "gitlink" entries, which is how they appear within whatever tree contains them.)