I tried to add Korean file(or other language) to gitignore, but it didn't work
in .gitignore
#ignore 예제파일/ (=exmapleFile/)
예제파일/
Any suggestion?
iBug's comment has one of the keys to making this work. The other is to be sure that the file is untracked.
The index, which is also called the staging area or sometimes the cache, controls whether a file is tracked or untracked. The index is also what Git uses when making new commits, so every file that is in the index goes into the next commit you make, once you make it. To see a list of every file in the index, along with its staging information, use git ls-files --stage
(note that this can be a very long list!): the file's path names appear at the end of each output line.
Git reports an untracked file when, in the process of scanning through a directory, it comes across a file whose path name is (a) not already in the index and (b) not listed in an ignore-or-exclude file. (There is some special handling for directories here, but let's leave that for later.)
In other words, any file in the index is tracked. A file that is not in the index is untracked, and some untracked files are also ignored. Crucially, a tracked file is never ignored.
For files with simple ASCII style names like README.txt
or Documentation/RelNotes/2.9.5.txt
, the path name is pretty obvious. It is encoded as a byte-string: the R
in README
or RelNotes
is a byte with value 82 (in decimal anyway: it is 0x52 in hexadecimal or 0122 in octal). But for other characters, such as the ö in schön or the é in agréable, or of course your 예제파일 (which I had to cut-and-paste here :-) ), there is a problem with encoding.
Git chooses to assume that all file names are encoded in UTF-8. Your operating system may choose some other encoding internally—for instance, Windows uses UTF-16 in a number of its file systems—but Git assumes UTF-8, which has numerous advantages including not requiring a byte order marker (BOM). This does not solve all problems—there are still issues with normalization—but points us to the answer we want for .gitignore
files.
(Git also uses this UTF-8 form in the index.)
When Git goes to read a .gitignore
file, it opens it as a stream of bytes, which should contain the UTF-8 encoding for each file name, terminated by newlines. Then, when Git goes to read a directory to extract file (or sub-directory) names from the operating system, Git will convert these names to UTF-8 strings. If those file names represent untracked files, Git will compare the resulting UTF-8 strings with the UTF-8-encoded strings in each line in the .gitignore
file.
If the UTF-8 encoded strings match, the untracked file's name is ignored (or un-ignored if prefixed with !
, since of course all the usual rules apply).
If the contents of the .gitignore
file are not UTF-8 encoded strings, the attempt to ignore will fail, because a UTF-8 representation of 예제파일 (b'\xec\x98\x88\xec\xa0\x9c\xed\x8c\x8c\xec\x9d\xbc'
in Python, for instance) will not match a UTF-16LE representation of the same characters:
>>> fn = b'\xec\x98\x88\xec\xa0\x9c\xed\x8c\x8c\xec\x9d\xbc'
>>> fn
b'\xec\x98\x88\xec\xa0\x9c\xed\x8c\x8c\xec\x9d\xbc'
>>> fn.decode('utf-8')
'예제파일'
>>> fn.decode('utf-8').encode('utf-16le')
b'\x08\xc6\x1c\xc8\x0c\xd3|\xc7'
Git stores only files in a repository. This creates a bit of tension between directories—which must exist to hold the files—and the files themselves. One side effect is that you can't store an empty directory in a Git commit (see How can I add an empty directory to a Git repository?), but another comes up with using .gitignore
.
The operating system's facilities for finding files generally requires that you start by looking inside a directory (or "folder", if you prefer that metaphor). This directory has a name inside the file system. Git will open the directory, by its name, and read through its contents, one entry at a time. Each entry will list either a file's name, or another directory's name. Git can check each such file-name—after combining it with the parent directory's name and a slash, giving dir/README.txt
for instance—against the index (to see if it's tracked) and then, if not tracked, against all ignore lists (to see if Git should complain about it, or ignore it).
But searching inside a directory is relatively slow. Suppose that Git has a path like a/b/c/d
that represents a directory. Git can and does first look in the index to see if there are any files already tracked within a/b/c/d
. If so, Git must read the directory. But if not, Git can now check all the ignore lists to see if a/b/c/d
itself is ignored.
If a/b/c/d
is ignored, Git is not forced to read its contents! If there are millions of files within a/b/c/d
—whether in subdirectories or not—this is a major time savings. So Git does that, too. If Git never looks inside a/b/c/d
, it will never find any untracked files within a/b/c/d
. This is why you must explicitly un-ignore directories in some cases: to force Git to look inside them for untracked files.
(One might think that listing, in a .gitignore
, something like:
a/b/c/d
!a/b/c/d/e/important.file
would be enough to tell Git: yes, ignore everything within a/b/c/d
, but still look inside d
for d/e
and subsequently d/e/important.file
since you will have to look inside it to un-ignore such a file. And Git may become this smart at some point, but historically, it has not been. So the rule for this is to list it as:
a/b/c/d/*
!a/b/c/d/e
a/b/c/d/e/*
!a/b/c/d/e/important.file
which overrides the "ignore everything" rule for a/b/c/d/e
: a/b/c/d
itself is not ignored, so Git opens and reads it. Then a/b/c/d/any
is ignored unless any is explicitly e
, which is not ignored. So Git opens a/b/c/d/e
and reads it. Anything in a/b/c/d/e
is ignored except for important.file
.)