Isolating operating system processes 🐧📦
Process Isolation : Part 1 : Isolation from the parent process, a new root and namespaces.
An operating system process is a single execution of task. This execution is dependent on an environment which contains the necessary resources to ensure a successful run. To isolate a process like this takes a few steps. Isolating a process’s view to the outside world, then isolating necessary dependencies for the process, then the resources required by it to run, then more isolation based on access to different features of the underlying system itself, isolated communication between isolated processes, we are getting a little out of hand here.
We are going to perform the first step of the process isolation with isolating the view of a process.
Process isolation and containerization
Containerization at it’s core is process isolation. At any given point, a process will contain the program that is running, the memory allocated to the process, the CPU state a list of open files and other resources such as IO devices. To isolate a process we can use tools provided by the operating system kernel.
Kernel features for isolating process views (namespaces)
15th November 2024
Process isolation in Linux requires certain Linux Kernel features to ensure isolation of views between proceses.
16th November 2024
A new root
Let’s create a new root directory for our isolated process. From now on we are also going to refer to this process as a container for the sake of brevity. As we are getting starting from scratch from a new environment we will ensure we have some of the basics atleast. After creating the directory structure for the container, we are copying the bash
and ls
commands commands for some initial navigation through the environment.
mkdir new_root
mkdir -p new_root/{bin,lib,lib64}
# copy the commands you want in your container
cp /bin/{bash,ls} new_root/bin/
Let’s remove the access from all our current resources in our current root directory and then jump into the new root directory where we have a view of nothing except the commands ls
and bash
.
Let’s use chroot
to change our root to a new directory called new_root
.
sudo chroot ./new_root /bin/bash
chroot: failed to run command ‘/bin/bash’: No such file or directory
Here /bin/bash
fails to run because it does not contain the necessary dependencies which it needs to run. We have no idea what dependencies the application is talking about but if it could tell us it would be great. For this purpose we will get the shared object for dynamic linker because it is used to resolve dependencies during process runtime.
cp /lib/ld-linux-aarch64.so.1 new_root/lib/
sudo chroot ./new_root /bin/bash
/bin/bash: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
Now after running /bin/bash
in the new environment again we will get a error with respect to an unavailable dependency. Let’s check which dependencies our two programs here bash
and ls
have. The command ldd
helps us do that.
ldd /bin/{bash,ls}
/bin/bash:
linux-vdso.so.1 (0x0000ffff8f630000)
libtinfo.so.6 => /lib/aarch64-linux-gnu/libtinfo.so.6 (0x0000ffff8f430000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff8f280000)
/lib/ld-linux-aarch64.so.1 (0x0000ffff8f5f7000)
/bin/ls:
linux-vdso.so.1 (0x0000ffff7ff02000)
libselinux.so.1 => /lib/aarch64-linux-gnu/libselinux.so.1 (0x0000ffff7fe50000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff7fca0000)
/lib/ld-linux-aarch64.so.1 (0x0000ffff7fec9000)
libpcre2-8.so.0 => /lib/aarch64-linux-gnu/libpcre2-8.so.0 (0x0000ffff7fc00000)
Copy these dependencies to the to-be-isolated root.
cp /lib/aarch64-linux-gnu/libtinfo.so.6 /lib/aarch64-linux-gnu/libc.so.6 /lib/ld-linux-aarch64.so.1 /lib/aarch64-linux-gnu/libselinux.so.1 /lib/aarch64-linux-gnu/libpcre2-8.so.0 new_root/lib/
Now that the necessary dependencies are copied, let’s change root into the new root again.
sudo chroot ./new_root /bin/bash
bash-5.1# ls
bin lib lib64
We are able to run all the commands we moved there but, not the command we didn’t i.e. ps
. So let’s move ps
and what all processes we can see.
bash-5.1# ps -aux
bash: ps: command not found
Let’s exit out of the container, copy the command and it’s dependencies then let’s try doing the same again.
bash-5.1# exit
Copy the command.
cp /bin/ps ./new_root/bin/
Use the following command to print the locations of the dependencies only so that we can cycle through them.
ldd /bin/ps | awk '{print $3}' | grep -v '^$'
/lib/aarch64-linux-gnu/libprocps.so.8
/lib/aarch64-linux-gnu/libc.so.6
/lib/aarch64-linux-gnu/libsystemd.so.0
/lib/aarch64-linux-gnu/liblzma.so.5
/lib/aarch64-linux-gnu/libzstd.so.1
/lib/aarch64-linux-gnu/liblz4.so.1
/lib/aarch64-linux-gnu/libcap.so.2
/lib/aarch64-linux-gnu/libgcrypt.so.20
/lib/aarch64-linux-gnu/libgpg-error.so.0
Now, let’s copy the above to our new root using the following command.
for dep in `ldd /bin/ps | awk '{print $3}' | grep -v '^$' `; do cp --parents "$dep" ./new_root; done;
Now, let’s run our ps
command to see what happens next.
sudo chroot ./new_root /bin/bash
bash-5.1# ps
Error, do this: mount -t proc proc /proc
bash-5.1# exit
exit
Ok, seems like the command `ps` knows how to help us solve this problem. Let’s move the mount command into our container with it’s dependencies.
cp /bin/mount ./new_root/bin/
for dep in `ldd /bin/mount | awk '{print $3}' | grep -v '^$' `; do cp --parents "$dep" ./new_root; done;
Now, let’s run the mount
command in the container considering that we copied this command to do what we were told to do by the ps
process.
sudo chroot ./new_root /bin/bash
bash-5.1# mount
mount: failed to read mtab: No such file or directory
bash-5.1# exit
Now, looks like we need a mtab. /etc/mtab
is a symlink and we can see the chain below.
ll /etc/mtab
lrwxrwxrwx 1 root root 19 Oct 2 07:43 /etc/mtab -> ../proc/self/mounts
Even /proc/mounts
is a symlink.
ll /proc/mounts
lrwxrwxrwx 1 root root 11 Nov 15 21:04 /proc/mounts -> self/mounts
If we mount the /proc
in the new_root to a correct location we can see that the ps
command and the mount
command work well after in our effort to get ps
running finally.
sudo mount -t proc wavey ./new_root/proc
Run the mount
command inside the chroot
env. The command works properly after mounting the /proc
.
bash-5.1# mount
wavey on /proc type proc (rw,relatime)
Run the ps
command inside the chroot
env. We see that the command works well but we still have a view of all the other processes running in the system. When isolating a process it is important to ensure that the process doesn’t have a lens into the outside functioning environment.
ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
0 1 0.0 0.2 167900 11144 ? Ss Nov16 0:11 /sbin/init
...
To ensure that a process will have access only to certain files and also only have a view into it’s own functioning as a process, we are going to use namespaces.
Namespaces
At this point I was having a hard time figuring out the usage of the unshare command. Eric Chaing’s blog on containers from scratch really me get a clearer idea.)
In the above run of ps
, we are returning the process information from all the other processes running on the underlying system. Let’s put this process in a namespace and ensure that this doesn’t happen. The command below let’s us run a command in a new namespace.
sudo unshare --pid --fork --mount-proc=$PWD/new_root/proc chroot ./new_root /bin/bash
As you can see above, we are using the —fork
, —pid
, and —mount-proc
flags.
PID Namespace
With fork
, we are forking the execution into a new child process so that the process becomes PID 1 in it’s namespace. This is part of creating a PID namespace where first before we start namespacing the process to have a view of other processes. But this doesn’t mean that the identifier of these processes change. To ensure that we don’t pass down identifiers of these processes, we fork the execution into a new process. When creating a PID namespace with —pid
and not providing a —fork
flag we get the following error.
The child process is not able to fork further children.
bash-5.1# ps
PID TTY TIME CMD
2060 ? 00:00:00 sudo
2061 ? 00:00:00 bash
2062 ? 00:00:00 ps
bash-5.1# ps
bash: fork: Cannot allocate memory
bash-5.1# ls
bash: fork: Cannot allocate memory
The process is able to execute only once and the command is not able to execute any other command after. This is because the inability of the command to fork.
Being able to fork into new processes and Process Identifiers (PIDs) respectively is very important and thus a very important part of creating a PID namespace.
Now, that we have isolated the PID, it’s time to isolate the view of the processes.
Process Namespace
Mounting a procfs
Sadly, the view of the other processes comes when we bring in the proc mount to get a view of our own namespace. Let’s do a new proc
virtual filesystem mount called wavey onto /proc
of our new root.
sudo mount -t proc wavey ./new_root/proc
With that, we have created a dummy virtual file system where nothing lives right now, but we will use this as a base for our new procfs which needs to be a virtual filesystem.
After doing the above, entering the process jail and running mount to list the different mount
in the container we get the below output.
bash-5.1# mount
wavey on /proc type proc (rw,relatime)
Let’s create a procfs specifically for the namespace we are creating with unshare
using the —mount-proc
command with the full command below.
sudo unshare --pid --fork --mount-proc=$PWD/new_root/proc chroot ./new_root /bin/bash
After giving a new location to the procfs which doesn’t have any information on the other processes, let’s run the mount command again.
bash-5.1# mount
wavey on /proc type proc (rw,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
The second entry above is the procfs mount we just did. When we run the ps
command in the container we will see the following.
bash-5.1# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
0 1 0.0 0.0 3984 3200 ? S 19:18 0:00 /bin/bash
0 2 0.0 0.0 6408 2380 ? R+ 19:21 0:00 ps aux
No processes can be seen apart from the ones running in the context of this container.
25th November 2024
With that, our process do not have any lens into the underlying system, isolating it away from other running processes. We can also isolate the process with other kinds of namespaces further.
Now that the namespace has been processed we can namespace other aspect other aspects of this process.
Time Namespace
We would also like to have a virtualized view of time for our process.
If we go to the process and run uptime
we get the following output.
bash-5.1# uptime -p
up 3 hours, 30 minutes
If we would like to start our process 9 years ahead into the future we should run the following command.
sudo unshare --fork --pid --time --boottime 300000000 --mount-proc=$PWD/new_root/proc chroot ./new_root /bin/bash
Running uptime
after, gets us the following output.
bash-5.1# uptime -p
up 9 years, 28 weeks, 8 hours, 56 minutes
If you would like to learn more about Time Namespaces, check out the article below.
With that we have gotten a glimpse into creating a new root and a namespaced view for the process. In the next iteration, we are going to explore namespaces in Linux further and see what it means to isolate a process in terms of resource usage.