What I have learned so far about the Ubuntu mitigation on CVE-2022-25636
by yaobin.wen
The other day I learned about the security issue on Linux netfilter
, i.e. CVE-2022-25636. [1] The official web page regarding this issue on Ubuntu provides the following mitigation:
Disable unprivileged user namespaces to restrict access to privileged
users (have CAP_NET_ADMIN) via the kernel.unprivileged_userns_clone
sysctl:
$ sudo sysctl kernel.unprivileged_userns_clone=0
My work project uses Docker and we want to understand what the potential impact could be on the use of Docker if we applied this mitigation. So I spent some time during the past weekend to read about the related topics.
I read the following documents:
- [2] Namespaces in operation, part 1: namespaces overview and the subsequent articles in this series, by Michael Kerrisk.
- [3] Understanding user namespaces by Michael Kerrisk.
- [4] Docker: Seccomp security profiles for Docker
- [5] Docker: Isolate containers with a user namespace
- [6] Docker: Run the Docker daemon as a non-root user (Rootless mode)
I didn’t read but want to read the blog [7] The Discovery and Exploitation of CVE-2022-25636 by Nick Gregory who discovered CVE-2022-25636. I also tried the code [8] Bonfee/CVE-2022-25636.
My findings and understanding so far are as follows:
- 1). Among all the namespaces that kernel has implemented, the user namespace doesn’t require root privilege to create, while all the other namespaces (mount points, process IDs, network, etc.) require root privilege to create.
- 2). The danger of the user namespace is: the first process in the namespace is assigned with root privilege because this first process is considered to be the “init” process for that user namespace, so root privilege is given so this process can properly initialize the namespace for the subsequently created child processes.
- a). A regular Ubuntu system, when booted up, has the process
init
whose PID is1
. This first process in the newly created user namespace plays a very similar role with the system-wiseinit
process.
- a). A regular Ubuntu system, when booted up, has the process
- 3). Usually, the root privilege of this first process should not be a problem because its power is limited to the resources in the namespaces the process is in. But I guess the security issue here might be either this root privilege is somehow expanded to the entire system rather than only in the new user namespace, or the kernel is confused and gives the process privilege to manipulate resources outside the enclosing namespaces. I haven’t read into the technical details yet.
- 4). Allowing the unprivileged user to create user namespaces is similar to the gate that hackers must enter in order to exploit this security loophole. What the workaround
sudo sysctl kernel.unprivileged_userns_clone=0
does is close this gate so the hackers won’t be able to enter in the first place. - 5). When
kernel.unprivileged_userns_clone
is enabled, the code demo_userns.c(used in Part 5: User namespaces of [2]) can be run without problem; oncekernel.unprivileged_userns_clone
is disabled, running demo_userns.c will result in the error “clone: Operation not permitted”. - 6). Regarding the impact on Docker:
- a). [4] says the default Docker’s default
seccomp
profile blocks a number of syscalls and it specifically mentions creating new namespaces: “Deny cloning new namespaces. Also gated byCAP_SYS_ADMIN
forCLONE_*
flags, exceptCLONE_NEWUSER
.” My understanding of this sentence is:- i). First, creating new namespaces, including user namespaces, is completely denied inside a Docker container.
- ii). This denial is also implemented by the fact that containers by default don’t have
CAP_SYS_ADMIN
- belt and braces. But because the creation of user namespace doesn’t need root privilege at all, the removal ofCAP_SYS_ADMIN
itself doesn’t help block creating new user namespaces. In sum, creation of user namespace is still blocked, but it’s not blocked by the removal ofCAP_SYS_ADMIN
. - iii). What I read above also gave me some confidence that setting
kernel.unprivileged_userns_clone=0
may not affect the use of Docker.
- c). If the Docker daemon is run as
root
,kernel.unprivileged_userns_clone=0
may not affect Docker because in this case the daemon is not run as an unprivileged user. However, if [6] is implemented, which sounds like the Docker daemon is run as an unprivileged userone code in [8]: - a). The author said he ran the code with the kernel
5.13.0-30
and could reproduce it with the success rate of 40%. However, I compiled his code on my Ubuntu 20.04.1 VM which also has the kernel5.13.0-30
but couldn’t reproduce his hack. I’m not sure if it’s because I’m running it on a VM or not. - b). [1] says the affected kernel versions are “5.4 through 5.6.10”, but [8] reproduced the hack using the kernel
5.13.0-30
which is out of the affected range. I asked the author about this on Twitter but so far (2022-03-21T19:20:00-0400) I haven’t received a reply yet (expected, because it’s 12:20AM in Italy).
- a). [4] says the default Docker’s default