How to calculate network system downtime

Here are two systems, A and B. How to calculate the downtime of each.

For A, should it be: 0.01 * 10 * 6 * 12 = 7.2 hours/year?

A system has 10 physical nodes, if any of those nodes failed, the whole system go down. The probability of failure for a individual node is 1% per month, and the downtime is 6h for fixing. Then what is the downtime for the whole system per year.

B system has 10 physical nodes, if 9 out of 10 nodes is running the whole system can function as normal. The probability of failure for a individual node is 1% per month, and the downtime is 6h for fixing. Then what is the downtime for the whole system per year.

Solution

We are talking about expected downtimes here, so we'll have to take a probabalistic approach.

We can take a Poisson approach to this problem. The expected failure rate is 1% per month for a single node, or 120% (1.2) for 10 nodes in 12 months. So you are correct that 1.2 failures/year * 6 hours/failure = 7.2 hours/year for the expected value of A.

You can figure out how likely a given amount of downtime is by using 7.2 as the lambda value for the poisson distribution.

Using R: ppois(6, lambda=7.2) = 0.42, meaning there is a 42% chance that you will have less than 6 hours of downtime in a year.

For B, it's also a Poisson, but what's important is the probability that a second node will fail in the six hours after the first failure.

The failure rate (assuming a 30 day month, with 120 6 hour periods) is 0.0083% per 6 hour period per node.

So we look at the chances of two failures within six hours, times the number of six hour periods in a year.

Using R: dpois(2.0, lambda=(0.01/120)) * 365 * 4 = 0.000005069

0.000005069 * 3 expected hours/failure = 54.75 milliseconds expected downtime per year. (3 expected hours per failure because the second failure should occur on average half way through the first failure.)