Search code examples
springspring-bootcloud-foundry

Spring boot application restart automatically when Cloud foundry updates/upgrade


I am using Cloud Foundry and I deployed my Spring boot application on Cloud. Whenever there is some updates/upgrade happens on Cloud foundry, my application got restart and some request got failed to reach to application as restart of application takes more time to get up. Is there any way in CF that some instances of application will be running while upgrade/restart of application to process requests. Also I want to know, if CF provides services from different locations/regions, so consider my application will be deployed on 2 CF containers available on different region. Wherever there is some updates/upgrade available, proceed upgrade on one region for Cf so other CF service from another region will be available and some application instances will be running to serve requests and vice versa. -Thank you.


Solution

  • What you're describing is the intended behavior of CF.

    • If you have two or more instance of your application, they should never both go down at the same time. i.e. one will be taken down, then after it's restarted successfully, then the other will be taken down and restarted.

    • If your operator has configured multiple availability zones for the foundation that you've targeted, then application instances will be distributed across those AZs to help facilitate HA and best possible availability.

    If you're not seeing this behavior then you should take a look at the following as these items can affect uptime of your apps:

    • Do you have more than one application instance? If you only have one application instance, then you can expect to see some small windows of downtime when updates are applied to the foundation and under other scenarios. This happens because a times Diego will need to evict applications running on a Diego Cell. It makes an attempt to start your app on another Cell before stopping the current instance, but there are no guarantees provided around this. Thus you can end up with some downtime, if for example your app is slow to start or your app does not have a good health check configured (like it passes the health check before the app is really up).

    • Did your operator set up multiple AZs? As a developer, you cannot really tell. This is abstracted away, so you would need to ask your platform operations team and confirm if there are more than one and if so how many. For best possible uptime, have at least as many app instances as you have AZs.

    • The other thing often overlooked, does your application depend on any services? If so, it is also possible that you will see downtime when services are being updated. That all depends on the services you are using and if there will be associated downtime for management and upgrades of those services. You may be able to tell if this is the case by looking more closely at your application logs when it fails to see if there are connection failures or errors like that. You might also be able to tell by looking at the plan defined in the CF Marketplace. Often the description will say if there are stipulations regarding the plan, like it is or isn't clustered or HA.

    UPDATE

    One other thing which can cause downtime:

    • If your operator has the "max in flight" value too high for the number of Diego Cells this can also cause downtime. Essentially, "max in flight" dictates how many Diego Cells will be taken out of service during an upgrade. If this value is too high, you can run into a situation where there is not enough capacity in the remaining Cells to host all of your applications. This ends up resulting in downtime for app instances as they cannot be rescheduled on another Cell in a timely manner. As a developer, I don't think this is something you can troubleshoot, you would need to work with your platform operators to investigate further.

    That is probably a theme here. If you are an app developer, you should be talking to your platform operations team to debug this.