21 March 2022

What I have learned so far about the Ubuntu mitigation on CVE-2022-25636

by yaobin.wen

The other day I learned about the security issue on Linux netfilter, i.e. CVE-2022-25636. [1] The official web page regarding this issue on Ubuntu provides the following mitigation:

Disable unprivileged user namespaces to restrict access to privileged
users (have CAP_NET_ADMIN) via the kernel.unprivileged_userns_clone
sysctl:
  $ sudo sysctl kernel.unprivileged_userns_clone=0

My work project uses Docker and we want to understand what the potential impact could be on the use of Docker if we applied this mitigation. So I spent some time during the past weekend to read about the related topics.

I read the following documents:

[2] Namespaces in operation, part 1: namespaces overview and the subsequent articles in this series, by Michael Kerrisk.
[3] Understanding user namespaces by Michael Kerrisk.
[4] Docker: Seccomp security profiles for Docker
[5] Docker: Isolate containers with a user namespace
[6] Docker: Run the Docker daemon as a non-root user (Rootless mode)

I didn’t read but want to read the blog [7] The Discovery and Exploitation of CVE-2022-25636 by Nick Gregory who discovered CVE-2022-25636. I also tried the code [8] Bonfee/CVE-2022-25636.

My findings and understanding so far are as follows:

1). Among all the namespaces that kernel has implemented, the user namespace doesn’t require root privilege to create, while all the other namespaces (mount points, process IDs, network, etc.) require root privilege to create.
2). The danger of the user namespace is: the first process in the namespace is assigned with root privilege because this first process is considered to be the “init” process for that user namespace, so root privilege is given so this process can properly initialize the namespace for the subsequently created child processes.
- a). A regular Ubuntu system, when booted up, has the process init whose PID is 1. This first process in the newly created user namespace plays a very similar role with the system-wise init process.
3). Usually, the root privilege of this first process should not be a problem because its power is limited to the resources in the namespaces the process is in. But I guess the security issue here might be either this root privilege is somehow expanded to the entire system rather than only in the new user namespace, or the kernel is confused and gives the process privilege to manipulate resources outside the enclosing namespaces. I haven’t read into the technical details yet.
4). Allowing the unprivileged user to create user namespaces is similar to the gate that hackers must enter in order to exploit this security loophole. What the workaround sudo sysctl kernel.unprivileged_userns_clone=0 does is close this gate so the hackers won’t be able to enter in the first place.
5). When kernel.unprivileged_userns_clone is enabled, the code demo_userns.c(used in Part 5: User namespaces of [2]) can be run without problem; once kernel.unprivileged_userns_clone is disabled, running demo_userns.c will result in the error “clone: Operation not permitted”.
6). Regarding the impact on Docker:
- a). [4] says the default Docker’s default seccomp profile blocks a number of syscalls and it specifically mentions creating new namespaces: “Deny cloning new namespaces. Also gated by CAP_SYS_ADMIN for CLONE_* flags, except CLONE_NEWUSER.” My understanding of this sentence is:
  - i). First, creating new namespaces, including user namespaces, is completely denied inside a Docker container.
  - ii). This denial is also implemented by the fact that containers by default don’t have CAP_SYS_ADMIN - belt and braces. But because the creation of user namespace doesn’t need root privilege at all, the removal of CAP_SYS_ADMIN itself doesn’t help block creating new user namespaces. In sum, creation of user namespace is still blocked, but it’s not blocked by the removal of CAP_SYS_ADMIN.
  - iii). What I read above also gave me some confidence that setting kernel.unprivileged_userns_clone=0 may not affect the use of Docker.
- c). If the Docker daemon is run as root, kernel.unprivileged_userns_clone=0 may not affect Docker because in this case the daemon is not run as an unprivileged user. However, if [6] is implemented, which sounds like the Docker daemon is run as an unprivileged userone code in [8]:
- a). The author said he ran the code with the kernel 5.13.0-30 and could reproduce it with the success rate of 40%. However, I compiled his code on my Ubuntu 20.04.1 VM which also has the kernel 5.13.0-30 but couldn’t reproduce his hack. I’m not sure if it’s because I’m running it on a VM or not.
- b). [1] says the affected kernel versions are “5.4 through 5.6.10”, but [8] reproduced the hack using the kernel 5.13.0-30 which is out of the affected range. I asked the author about this on Twitter but so far (2022-03-21T19:20:00-0400) I haven’t received a reply yet (expected, because it’s 12:20AM in Italy).

Tags: Tech

yaobin.wen

Yaobin's Blog

What I have learned so far about the Ubuntu mitigation on CVE-2022-25636

Copyright © 2016-2023 Yaobin Wen All rights reserved.

If present, the "License" field in the articles overrides the copyright disclaimer. See "LICENSE.md" in the GitHub repository for more details.