Search code examples
clinuxsocketsunixipc

What is the purpose of "\0hidden" in an AF_UNIX socket path?


I followed a tutorial on how to make two processes on Linux communicate using the Linux Sockets API, and that's the code it showed to make it happen:

Connecting code:

char* socket_path = "\0hidden";
int fd = socket(AF_UNIX, SOCK_STREAM, 0);
struct sockaddr_un addr;
memset(&addr, 0x0, sizeof(addr));
addr.sun_family = AF_UNIX;
*addr.sun_path = '\0';
strncpy(addr.sun_path+1, socket_path+1, sizeof(addr.sun_path)-2);
connect(fd, (struct sockaddr*)&addr, sizeof(addr));

Listening code:

char* socket_path = "\0hidden";
struct sockaddr_un addr;
int fd = socket(AF_UNIX, SOCK_STREAM, 0);
memset(&addr, 0x0, sizeof(addr));
addr.sun_family = AF_UNIX;
*addr.sun_path = '\0';
strncpy(addr.sun_path+1, socket_path+1, sizeof(addr.sun_path)-2);
bind(fd, (struct sockaddr*)&addr, sizeof(addr));
listen(fd, 5);

Basically, I have written a web server for a website in C, and a database management system in C++, and making them communicate (after a user's browser sends an HTTP request to my web server, which it's listening for using an AF_INET family socket, but that's not important here, just some context) using this mechanism.The database system is listening with its socket, and the web server connects to it using its own socket. It's been working perfectly fine.

However, I never understood what the purpose of a null byte at the beginning of the socket path is. Like, what the heck does "\0hidden" mean, or what does it do? I read the manpage on sockets, it says something about virtual sockets, but it's too technical for me to get what's going on. I also don't have a clear understanding of the concept of representing sockets as files with file descriptors. I don't understand the role of the strncpy() either. I don't even understand how the web server finds the database system with this code block, is it because their processes were both started from executables in the same directory, or is it because the database system is the only process on the entire system listening on an AF_UNIX socket, or what?

If someone could explain this piece of the Linux Sockets API that has been mystifying me for so long, I'd be really grateful. I've googled and looked at multiple places, and everyone simply seems to be using "\0hidden" without ever explaining it, as if it's some basic thing that everyone should know. Like, am I missing some piece of theory here or what? Massive thanks to anybody explaining in advance!


Solution

  • This is specific to the Linux kernel implementation of the AF_UNIX local sockets. If the character array which gives a socket name is an empty string, then the name doesn't refer to anything in the filesystem namespace; the remaining bytes of the character array are treated as an internal name sitting in the kernel's memory. Note that this name is not null-terminated; all bytes in the character array are significant, regardless of their value. (Therefore it is a good thing that your example program is doing a memset of the structure to zero bytes before copying in the name.)

    This allows applications to have named socket rendezvous points that are not occupying nodes in the filesystem, and are therefore are more similar to TCP or UDP port numbers (which also don't sit in the file system). These rendezvous points disappear automatically when all sockets referencing them are closed.

    Nodes in the file system have some disadvantages. Creating and accessing them requires a storage device. To prevent that, they can be created in a temporary filesystem that exists in RAM like tmpfs in Linux; but tmpfs entries are almost certainly slower to access and take more RAM than a specialized entry in the AF_UNIX implementation. Sockets that are needed temporarily (e.g. while an application is running) may stay around if the application crashes, needing external intervention to clean them up.

    hidden is probably not a good name for a socket; programs should take advantage of the space and use something quasi-guaranteed not to clash with anyone else. The name allows over 100 characters, so it's probably a good idea to use some sort of UUID string.

    The Linux Programmer's Manual man page calls this kind of address "abstract". It is distinct and different from "unnamed".

    Any standard AF_UNIX implementation provides "unnamed" sockets which can be created in two ways: any AF_UNIX socket that has been created with socket but not given an address with bind is unamed; and the pair of sockets created by socketpair are unnamed.

    For more information, see

    man 7 unix
    

    in some GNU/Linux distro that has the Linux Man Pages installed.