Search code examples
dockercontainerskubernetesdocker-swarmflynn

Why is kubernetes source code an order of magnitude larger than other container orchestrators?


Considering other orchestration tools like dokku, dcos, deis, flynn, docker swarm, etc.. Kubernetes is no where near to them in terms of lines of code, on an average those tools are around 100k-200k lines of code.

Intuitively it feels strange that to manage containers i.e. to check health, scale containers up and down, kill them, restart them, etc.. doesn't have to consist of 2.4M+ lines of code (which is the scale of an entire Operating System code base), I feel like there is something more to it.

What is different in Kubernetes compared to other orchestration solutions that makes it so big?

I dont have any knowledge of maintaining more than 5-6 servers. Please explain why it is so big, what functionalities play big part in it.


Solution

  • First and foremost: don't be misled by the number of lines in the code, most of it are dependencies in the vendor folder that does not account for the core logic (utilities, client libraries, gRPC, etcd, etc.).

    Raw LoC Analysis with cloc

    To put things into perspective, for Kubernetes:

    $ cloc kubernetes --exclude-dir=vendor,_vendor,build,examples,docs,Godeps,translations
        7072 text files.
        6728 unique files.                                          
        1710 files ignored.
    
    github.com/AlDanial/cloc v 1.70  T=38.72 s (138.7 files/s, 39904.3 lines/s)
    --------------------------------------------------------------------------------
    Language                      files          blank        comment           code
    --------------------------------------------------------------------------------
    Go                             4485         115492         139041        1043546
    JSON                             94              5              0         118729
    HTML                              7            509              1          29358
    Bourne Shell                    322           5887          10884          27492
    YAML                            244            374            508          10434
    JavaScript                       17           1550           2271           9910
    Markdown                         75           1468              0           5111
    Protocol Buffers                 43           2715           8933           4346
    CSS                               3              0              5           1402
    make                             45            346            868            976
    Python                           11            202            305            958
    Bourne Again Shell               13            127            213            655
    sed                               6              5             41            152
    XML                               3              0              0             88
    Groovy                            1              2              0             16
    --------------------------------------------------------------------------------
    SUM:                           5369         128682         163070        1253173
    --------------------------------------------------------------------------------
    

    For Docker (and not Swarm or Swarm mode as this includes more features like volumes, networking, and plugins that are not included in these repositories). We do not include projects like Machine, Compose, libnetwork, so in reality the whole docker platform might include much more LoC:

    $ cloc docker --exclude-dir=vendor,_vendor,build,docs
        2165 text files.
        2144 unique files.                                          
         255 files ignored.
    
    github.com/AlDanial/cloc v 1.70  T=8.96 s (213.8 files/s, 30254.0 lines/s)
    -----------------------------------------------------------------------------------
    Language                         files          blank        comment           code
    -----------------------------------------------------------------------------------
    Go                                1618          33538          21691         178383
    Markdown                           148           3167              0          11265
    YAML                                 6            216            117           7851
    Bourne Again Shell                  66            838            611           5702
    Bourne Shell                        46            768            612           3795
    JSON                                10             24              0           1347
    PowerShell                           2             87            120            292
    make                                 4             60             22            183
    C                                    8             27             12            179
    Windows Resource File                3             10              3             32
    Windows Message File                 1              7              0             32
    vim script                           2              9              5             18
    Assembly                             1              0              0              7
    -----------------------------------------------------------------------------------
    SUM:                              1915          38751          23193         209086
    -----------------------------------------------------------------------------------
    

    Please note that these are very raw estimations, using cloc. This might be worth a deeper analysis.

    Roughly, it seems like the project accounts for half of the LoC (~1250K LoC) mentioned in the question (whether you value dependencies or not, which is subjective).

    What is included in Kubernetes that makes it so big?

    Most of the bloat comes from libraries supporting various Cloud providers to ease the bootstrapping on their platform or to support specific features (volumes, etc.) through plugins. It also has a Lot of Examples to dismiss from the line count. A fair LoC estimation needs to exclude a lot of unnecessary documentation and example directories.

    It is also much more feature rich compared to Docker Swarm, Nomad or Dokku to cite a few. It supports advanced networking scenarios, has load balancing built-in, includes PetSets, Cluster Federation, volume plugins or other features that other projects do not support yet.

    It supports multiple container engines, so it is not exclusively running docker containers but could possibly run other engines (such as rkt).

    A lot of the core logic involves interaction with other components: Key-Value stores, client libraries, plugins, etc. which extends far beyond simple scenarios.

    Distributed Systems are notoriously hard, and Kubernetes seems to support a majority of the tooling from key players in the container industry without compromise (where other solutions are making such compromise). As a result, the project can look artificially bloated and too big for its core mission (deploying containers at scale). In reality, these statistics are not that surprising.

    Key idea

    Comparing Kubernetes to Docker or Dokku is not really appropriate. The scope of the project is far bigger and it includes much more features as it is not limited to the Docker family of tooling.

    While Docker has a lot of its features scattered across multiple libraries, Kubernetes tends to have everything under its core repository (which inflates the line count substantially but also explains the popularity of the project).

    Considering this, the LoC statistic is not that surprising.