Sunday, 3 April 2016

Component architecture of "Minimal Linux Live"

Yesterday I published new release of Minimal Linux Live. Considering the massive changes in the new version, I decided to write an article which explains the internal component architecture of the generated operating system.

First let's see how the system looked like before the latest changes.

Minimal Linux Live - component architecture before version 03-Apr-2016
Minimal Linux Live - component architecture before version 03-Apr-2016
The above diagram shows all major components and their build dependencies in Minimal Linux Live before version 03-Apr-2016. Please note that Minimal Linux Live ISO image refers to the overall build process and not the actual ISO image file/folder structure.

We have host operating system which we use as our build environment. The host provides build toolchain (most notably the C compiler), default C library which almost always is glibc, kernel headers and the necessary tools to generate bootable ISO images - ISOLINUX.

This is the overall build process:
  1. Kernel sources are downloaded and then the kernel is compiled. Note that kernel headers are *not* generated during this phase.

  2. BusyBox sources are downloaded and then the BusyBox binary is compiled. BusyBox depends heavily on the kernel headers and these headers are taken from the host operating system. Do you see the architectural problem here? The host kernel headers may not be compatible with the actual kernel that we have already produced in the previous step.

    This may lead to unexpected issues because we can decide to build any Linux kernel while the host kernel headers are for a very specific kernel version. Almost all the time there are no compile time incompatibilities because the OS maintainers put a lot of effort to "polish" the host kernel headers and make them as generic as possible.

    BusyBox also depends on the C library (glibc) and the actual library functions are statically linked from the C library provided by the host machine. The only issue here is that glibc is designed in a way that even though we use static linking, glibc itself is not entirely static which causes problems with the DNS functionality due to unresolvable NSS libraries.

  3. The final step is to generate the ISO image. The major architectural issue is that the ISO generation functionality was depending on the kernel build infrastructure (the kernel's Makefile provides target for ISO image generation) and the default assumptions in the target were not always compatible with the host environment.
Even though there are many host dependencies, the overall build process is successful for almost 100% of the users and there are well known workarounds for the minority of users who are experiencing difficulties to build their own version of Minimal Linux Live. However, since the DNS functionality is broken, the overall user experience is quite low. This forced me to rethink the whole architecture and as a consequence the architectural changes resolved the DNS issue.


Minimal Linux Live - component architecture in version 03-Apr-2016
Minimal Linux Live - component architecture in version 03-Apr-2016
The above diagram shows all major components and their build dependencies in Minimal Linux Live version 03-Apr-2016. Please note that Minimal Linux Live ISO image refers to the overall build process and not the actual ISO image file/folder structure.

This time the major host dependencies are the toolchain and the ISO generation software package.

This is the overall build process:
  1. Kernel sources are downloaded and then the kernel is compiled. The final step in this phase is to generate kernel headers which are used in the next phases.

  2. The glibc sources are downloaded and all glibc shared objects are generated. Note that the glibc build process explicitly depends on the kernel headers which we have produced in the previous phase. In this way we guarantee 100% compatibility between the generated kernel binary and the generated glibc shared objects.

  3. BusyBox sources are downloaded and the BusyBox binary is compiled. This time the build process depends on the C library shared objects that we produced in phase 2 and indirectly BusyBox uses the kernel headers that we produced in phase 1. I'm using the word 'indirectly' because there is one interim step where we configure the glibc infrastructure to point to the kernel headers which we generated in phase 1.

    Since we now have full control over the glibc build process, we use dynamic linking and then copy only the glibc dependencies that the BusyBox binary needs for proper operation. We also copy the necessary NSS libraries in order to solve the DNS resolution issue. Last but not least, glibc generates the so called Linux dynamic loader which we also copy or otherwise the BusyBox binary won't work at all.

  4. The ISO image generation process still depends on ISOLINUX which is provided by the host. The architectural difference is that this phase no longer depends on the kernel build infrastructure which gives us very precise control over the ISO file/folder structure.
The new architecture may look more complex than before but if you think about it you will notice that all dependencies are the same as before, only now we build in advance some of these dependencies. Also, the new architecture is important prerequisite for introducing real root file system on the ISO image because right now the whole Minimal Linux Live runtime environment lives entirely in the RAM.

No comments:

Post a Comment