I'm sending the same query to a large (well, okay, n=5
) number of identical endpoints (a few kubernetes clusters) and collating the results together. I want the requests to go out in parallel and time out after a while; I want failures to be reported to the user without impeding further progress.
As an example of the desired output:
$ node find_broken_pods.js
tomato: error: timed out
Cluster Pod Reason
potato bar-foos Invalid PVC
potato foo-eggnog Invalid image
yoghurt spam-baz Invalid name
$ node find_broken_pods.js
Cluster Pod Reason
potato bar-foos Invalid PVC
potato foo-eggnog Invalid image
yoghurt spam-baz Invalid name
tomato what-bat Insufficient jQuery
The first time around, we failed to get an answer from the tomato cluster for whatever reason, but we were able to enumerate the resources from the other clusters. The second time around, there was no timeout and we were able to list everything.
I previously had come up with this:
export async function queryAll(): Promise<{cluster: string; deployments: V1Pod[]}[]> {
const out: {cluster: string; result: V1Pod[]}[] = [];
const promises: Promise<number>[] = [];
for (const cluster of Object.values(CLUSTERS)) {
promises.push(
Promise.race([
new Promise<number>((_, reject) => setTimeout(() => reject(new Error(`${cluster}: timed out`)), 5000)),
new Promise<number>((resolve, _) =>
getAllPods(cluster)
.then(pods => resolve(out.push({cluster: cluster, result: pods})))
),
])
);
}
await Promise.all(promises);
return out;
}
This does run everything in parallel, but a single failure results in the entire function crashing and burning. I figured I could change it to this:
export async function queryAll(): Promise<{cluster: string; deployments?: V1Deployment[]; error?: string}[]> {
const out: {cluster: string; result?: V1Pod[]; error?: string}[] = [];
const promises: Promise<number>[] = [];
for (const cluster of Object.values(CLUSTERS)) {
promises.push(
Promise.race([
new Promise<number>((resolve, _) =>
setTimeout(() => {
resolve(out.push({cluster: cluster, error: 'timed out'}));
}, 5000)
),
new Promise<number>((resolve, _) =>
getAllPods(cluster)
.then(pods => resolve(out.push({cluster: cluster, result: pods})))
),
])
);
}
await Promise.all(promises);
return out;
}
However, I'm now witnessing both arms of the promise run to completion, that is the out
array contains (some) clusters reporting data and either all or none of my clusters reporting a timeout:
Promise.all
won't wait for the setTimeout
s, but the node process will (i.e. if I set the timeout to 60 seconds then the node process will only exit after 60 seconds)Promise.all
will wait for the timeout to happen... meaning all setTimeout
s end up triggering.I instead expected that the losing arm of the Promise.race
would be killed somehow or prevented from running.
Something tells me my approach is fundamentally broken... how can I improve my fault tolerance?
Yes, you've got both tasks running and push
ing objects to the out
array. Nothing is stopping them - Promise.race
does not magically undo the steps that were taken to create the promises passed to it.
You can solve this with a flag for each cluster whether the timeout or result has already been reported so that the loser of the race won't report their status.
However, it's much simpler to just use the result value that the promises fulfill with, which Promise.race
already does forward:
interface QueryResult {cluster: string; deployments?: V1Deployment[]; error?: string}
export async function queryAll(): Promise<QueryResult[]> {
const promises: Promise<QueryResult>[] = Object.values(CLUSTERS).map(cluster =>
Promise.race([
new Promise(resolve => {
setTimeout(() => {
resolve({cluster: cluster, error: 'timed out'});
}, 5000)
}),
getAllPods(cluster).then(pods => ({cluster: cluster, result: pods}))
])
);
const out = await Promise.all(promises);
return out;
}
With this approach, you even get the results consistently in the same order as the CLUSTERS
, not in the order they came back. If you don't want that, you can still take your original approach with push
- just do it with the result of each race!
export async function queryAll(): Promise<QueryResult[]> {
const out: QueryResult[] = [];
await Promise.all(Object.values(CLUSTERS).map(cluster =>
Promise.race([
new Promise(resolve => {
setTimeout(() => {
resolve({cluster: cluster, error: 'timed out'});
}, 5000)
}),
getAllPods(cluster).then(pods => ({cluster: cluster, result: pods}))
]).then(result => {
out.push(result);
})
));
return out;
}
Btw, also consider Promise.allSettled
instead of Promise.all
- you might as well reject
from your timeout promise with that.