Search code examples
cposixlanguage-lawyerlibc

What is "namespace cleanliness", and how does glibc achieve it?


I came across this paragraph from this answer by @zwol recently:

The __libc_ prefix on read is because there are actually three different names for read in the C library: read, __read, and __libc_read. This is a hack to achieve "namespace cleanliness", which you only need to worry about if you ever set out to implement a full-fledged and fully standards compliant C library. The short version is that there are many functions in the C library that need to call read, but some of them cannot use the name read to call it, because a C program is technically allowed to define a function named read itself.

As some of you may know, I am setting out to implement my own full-fledged and fully standards-compliant C library, so I'd like more details on this.

What is "namespace cleanliness", and how does glibc achieve it?


Solution

  • First, note that the identifier read is not reserved by ISO C at all. A strictly conforming ISO C program can have an external variable or function called read. Yet, POSIX has a function called read. So how can we have a POSIX platform with read that at the same time allows the C program? After all fread and fgets probably use read; won't they break?

    One way would be to split all the POSIX stuff into separate libraries: the user has to link -lio or whatever to get read and write and other functions (and then have fread and getc use some alternative read function, so they work even without -lio).

    The approach in glibc is not to use symbols like read, but instead stay out of the way by using alternative names like __libc_read in a reserved namespace. The availability of read to POSIX programs is achieved by making read a weak alias for __libc_read. Programs which make an external reference to read, but do not define it, will reach the weak symbol read which aliases to __libc_read. Programs which define read will override the weak symbol, and their references to read will all go to that override.

    The important part is that this has no effect on __libc_read. Moreover, the library itself, where it needs to use the read function, calls its internal __libc_read name that is unaffected by the program.

    So all of this adds up to a kind of cleanliness. It's not a general form of namespace cleanliness feasible in a situation with many components, but it works in a two-party situation where our only requirement is to separate "the system library" and "the user application".