While toying around with an example from user_namespaces(7), I've come across a strange behaviour.
The application user-ns-ex
calls clone(2) with CLONE_NEWUSER, thus creating a new process in a new user namespace. The parent process writes a map (0 1000 1
) to /proc//uid_map file and tells (via a pipe) the child that it can proceed. The child process then execs bash
.
I've copied the source code here.
The application opens /proc//uid_map for writing if I either set it no capabilites or all of them.
When I set only set_capuid,set_capgid and optionally cap_sys_admin the call to open(2) fails:
Set caps:
arksnote linux-namespaces # setcap 'cap_setuid,cap_setgid,cap_sys_admin=epi' ./user-ns-ex
arksnote linux-namespaces # getcap ./user-ns-ex
./user-ns-ex = cap_setgid,cap_setuid,cap_sys_admin+eip
Try to run:
kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19666
ERROR: open /proc/19666/uid_map: Permission denied
About to exec bash
No capabilities:
arksnote linux-namespaces # setcap '=' ./user-ns-ex
arksnote linux-namespaces # getcap ./user-ns-ex
./user-ns-ex =
Runs Ok:
kamyshev@arksnote ~/workspace/personal/linux-kernel/linux-namespaces $ ./user-ns-ex -v -U -M '0 1000 1' bash
./user-ns-ex: PID of child created by clone() is 19557
About to exec bash
arksnote linux-namespaces # exit
I've been trying to find the reason in man-pages and playing with different capabilities but with no luck as of this moment. What puzzles me the most, is that the application runs with less capabilities and does not with more.
Can someone help me and clarify the issue?
I have found the reason. During my reasearch I have found that uid_map
file is not open because its ownership is changed to root
.
Unprivileged process, no capabilities:
parent(m): capabilities: '='
parent(m): file /proc/4644/uid_map owner uid: 1000
parent(m): file /proc/4644/uid_map owner gid: 1000
Unprivileged process, capabilities are set (cap_setuid=pe):
parent(m): capabilities: '= cap_setuid+ep'
parent(m): file /proc/4644/uid_map owner uid: 0
parent(m): file /proc/4644/uid_map owner gid: 0
ERROR: open /proc/4668/uid_map: Permission denied
The following research has led me to this topic: what causes proc pid resources to become owned by root?
This is what happens:
1) When a process is not dumpable, its /proc/<pid>
inodes are given a root ownership:
// linux/base.c
struct inode *proc_pid_make_inode(struct super_block * sb, struct task_struct *task)
...
if (task_dumpable(task)) {
rcu_read_lock();
cred = __task_cred(task);
inode->i_uid = cred->euid;
inode->i_gid = cred->egid;
rcu_read_unlock();
}
2) The process is dumpable only when its "dumpable" attribute has a value 1 (SUID_DUMP_USER). See ptrace(2).
3) prctl(2) clears the situation further:
Normally, this flag is set to 1. However, it is reset to the current value contained in the file /proc/sys/fs/suid_dumpable (which by default has the value 0), in the following circumstances: * The process's effective user or group ID is changed. * The process's filesystem user or group ID is changed (see credentials(7)). * The process executes (execve(2)) a set-user-ID or set- group-ID program, resulting in a change of either the effective user ID or the effective group ID. * The process executes (execve(2)) a program that has file capabilities (see capabilities(7)), but only if the permitted capabilities gained exceed those already permitted for the process.
Thus my problem arose from the last of the above rules:
int commit_creds(struct cred *new)
<...>
/* dumpability changes */
if (!uid_eq(old->euid, new->euid) ||
!gid_eq(old->egid, new->egid) ||
!uid_eq(old->fsuid, new->fsuid) ||
!gid_eq(old->fsgid, new->fsgid) ||
!cred_cap_issubset(old, new)) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
There are a number of ways to overcome the issue:
/proc/sys/fs/suid_dumpable
:echo 1 > /proc/sys/fs/suid_dumpable
prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)