Search code examples
chef-infraknife

How many nodes a single opensource chef-server can handle


Following is the hardware configuration of chef-server (opensource) being used.

4 GB of RAM; 5 GB of free disk space in /opt; 5 GB of free disk space in /var; Chef-server version installed 11.6;

How many nodes all running chef-client in daemon can this chef-server handle? Would it be possible for a single chef-server to handle nodes ranging from 300-500 ?

I have seen 3 configuration scenarios (Standalone,HA,Tiered) specified over chef-official documentation , on what factors do we decide which scenario to use ? Is there any specific reason for taking up a tiered configuration , does it allow more number of nodes ?


Solution

  • I have had 42 nodes on an open source chef server this size and it performed well but I had issues when bootstrapping nodes with complex run-list/role assignments more than 4 at a time.

    This environment consisted of - 20 unicorn/nginx nodes (highly complex setup) - 2 redis nodes (simple setup) - 3 mysql nodes (medium complex setup) - 15 mongodb nodes (medium complex setup) - 2 nginx nodes (simple setup)

    Here's a sketch of what my unicorn/nginx nodes looked like. I would call a highly complex setup. - Creating 5 unicorn and nginx endpoints using databags, encrypted data bags, git repos, templates. - Creating a nodejs endpoint - Creating linux 10 users using databags and encrypted databags - Setting up log rotation - Setting up config files based on mysql mongo and redis servers in my ecosystem - Creating an NFS mount scripting backups to that NFS mounts - and 20 or other basic sysadmin type tasks

    All of the nodes in my environment checked-in to the Chef server on the standard 1800 second interval. Lowering this interval will increase the load on your server. If you cross the threshold of your interval being less than the time it takes for all or your nodes to finish their chef-client run, your performance problems will snowball.

    I ran into bottlenecks when bootstrapping multiple complex new nodes at one time because nearly every single recipe must be acted upon such as calls to remote repos, setting directory permissions, service restarts, chkconfig updates, writing out config files, etc.

    I ran into performance bottlenecks on chef-client runs when asking the chef server to locate hosts that match specific criteria using Chef search function. For example, searching for a database (mysql, redis, etc.) master node in order to get the FQDN or IP of that box so Chef can then proceed with a database slave setup. I resolved these issues by reducing the number of searches and also searching for Chef tags with tag of .

    Consider asking: "How many minutes is an acceptable knife bootstrap?" "How many minutes is an acceptable chef-client run?" "What will be my most complicated Chef cookbooks/run-lists?" to help guide your performance analysis