Search code examples
microservicesgraceful-degradation

Graceful degradation in an application that uses Microservices Architeture


I know that this is a relatively new subject regarding Chaos Engineering and there is some stuff speaking how this strategy works, but I haven't found resources that approach how to apply it in real-world problems.

  1. Is this kind of strategy a requirement for any application which uses Microservices Architecture?
  2. Are there already some library/frameworks which ease its implementation?
  3. Is the monitoring of this application different than the one which does not use this strategy?

Solution

  • Is this kind of strategy a requirement for any application which uses Microservices Architecture?

    I wouldn't say it is a requirement. Afterall, you can have different and other challenges before you get to Chaos Engineering, or avoid CE completely if you have other mechanisms to cope with the problems CE tries to find.

    Are there already some library/frameworks which ease its implementation?

    Depending on the stack you're using, there is: Chaos Monkey for Spring Boot, gremlin, chaosmesh, and so on (see: https://github.com/dastergon/awesome-chaos-engineering) simpler tools include tc or stress

    Is the monitoring of this application different than the one which does not use this strategy?

    In my experience it is not that different, however, the field of monitoring changed quite a bit in the last years. I would recommend any system that supplies you with a good deal of observability in (almost) realtime. Anything that helps you getting better application performance monitoring helps a great deal when doing Chaos Engineering.

    Applying it to real world examples get's easier as soon as you get going. A good starter experiment (in my experience) is restarting a database or doing a rolling update. Everything you do with CE should be under performance load. Without any requests in your environment (often staging environments) you will not see what will really happen in production. Also, start as small as possible, then get to the bigger problems as soon as you gain more experience and trust in your systems.