I’m not docker’s biggest fan , but i do see its benefits , although i think somehow it has managed to sort of hide what docker really is and what it really does, but this post isn’t about docker , but about namespaces.

Docker/lxc use a kernel feature that , in simplest terms , allows a process to have isolation a multiple levels (pids / fs / hostnames / etc)

So the question is , if docker/lxc is a process , how much different is it from a normal process , let’s say a clone() of “ls”

the following is a normal ls image We all know the fork() syscall is actually clone , clone copies some parts of the parent process memory to the children etc etc .

This is a clone example for executing an containerized “ls”: image There are pretty much the same , but not quite , there’s some extra arguments , arguments that provide that isolation:

Let’s pick CLONE_NEWPID for example (man 7 namespaces)


Man says “ PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID. PID namespaces allow containers to provide functionality such as suspending/resuming the set of processes in the container and migrating the container to a new host while the processes inside the container maintain the same PIDs.”

So that makes sense , i guess it also helps to keep the number of pids quite low , as if you have a massive host running plenty of containers and plenty of apps pids would go quite high , we verify this by doing: image Somehow I got ps on pid 2 , pretty neat .

So there’s a lot of namespaces arguments you can pass to clone , clone takes a function as an argument (amongst others) you can “containerize” pretty much everything. image there child_main() is just a random C function .

I hope that explains the internals of namespaces a bit , there’s a missing part which is linking namespaces with cgroups. Maybe for another article.