unix filesystems directory relative-path inode

Why is '.' a hard link in Unix?

I've seen many explanations for why the link count for an empty directory in Unix based OSes is 2 instead of 1. They all say that it's because of the '.' directory, which every directory has pointing back to itself. I understand why having some concept of '.' is useful for specifying relative paths, but what is gained by implementing it at the filesystem level? Why not just have shells or the system calls that take paths know how to interpret it?

That '..' is a real link makes much more sense to me -- the filesystem needs to store a pointer back to the parent directory in order to navigate to it. But I don't see why '.' being a real link is necessary. It also seems like it leads to an ugly special case in the implementation -- you would think you could only free the space used by inodes that have a link count less than 1, but if they're directories, you actually need to check for a link count less than 2. Why the inconsistency?

Solution

(Hmm: the following is now a bit of an epic...)

The design of the directory on unix filesystems (which, to be pedantic, are typically but not necessarily attached to unix OSs) represents a wonderful insight, which actually reduces the number of special cases required.

A 'directory' is really just a file in the filesystem. All the actual content of files in the filesystem is in inodes (from your question, I can see that you're already aware of some of this stuff). There's no structure to the inodes on the disk -- they're just a big bunch of numbered blobs of bytes, spread like peanut-butter over the disk. This is not useful, and indeed is repellent to anyone with a shred of tidy-mindedness.

The only special inode is inode number 2 (not 0 or 1, for reasons of Tradition); inode 2 is a directory file: the root directory. When the system mounts the filesystem, it 'knows' it has to readdir inode 2, to get itself started.

A directory file is just a file, with an internal structure which is intended to be read by opendir(3) and friends. You can see its internal structure documented in dir(5) (depending on your OS); if you look at that, you'll see that the directory file entry contains almost no information about the file -- that's all in the file inode. One of the few things that's special about this file is that the open(2) function will given an error if you try to open a directory file with a mode which permits writing. Various other commands (to pick just one example, hexdump) will refuse to act in the normal way with directory files, just because that's probably not what you want to do (but that's their special case, not the filesystem's).

A hard link is nothing more nor less than an entry in a directory file's map. You can have two (or more) entries in such a map which both map to the same inode number: that inode therefore has two (or more) hard links. This also explains why every file has at least one 'hard link'. The inode has a reference count, which records how many times that inode is mentioned in a directory file somewhere in the filesystem (this is the number which you see when you do ls -l).

OK: we're getting to the point now.

The directory file is a map of strings ('filenames') to numbers (inode numbers). Those inode numbers are the numbers of the inodes of the files which are 'in' that directory. The files which are 'in' that directory might include other directory files, so their inode numbers will be amongst those listed in the directory. Thus, if you have a file /tmp/foo/bar, then the directory file foo includes an entry for bar, mapping that string to the inode for that file. There's also an entry in the directory file /tmp, for the directory file foo which is 'in' the directory /tmp.

When you create a directory with mkdir(2), that function

creates a directory file (with some inode number) with the correct internal structure,
adds an entry to the parent directory, mapping the new directory's name to this new inode (that accounts for one of the links),
adds an entry to the new directory, mapping the string '.' to the same inode (this accounts for the other link), and
adds another entry to the new directory, mapping the string '..' to the inode of the directory file it modified in step (2) (this accounts for the larger number of hard links you'll see on on directory files which contain subdirectories).

The end result is that (almost) the only special cases are:

The open(2) function tries to make it harder to shoot yourself in the foot, by preventing you opening directory files for writing.
The mkdir(2) function makes things nice and easy by adding a couple of extra entries ('.' and '..') to the new directory file, purely to make it convenient to move around the filesystem. I suspect that the filesystem would work perfectly well without '.' and '..', but would be a pain to use.
The directory file is one of the few types of files which are flagged as 'special' -- this is really what tells things like open(2) to behave slightly differently. See st_mode in stat(2).