I am using grafana-tempo distributed helm chart. It is successfully deployed and its backend is configured on Azure Storage (blob containers) and working fine.
I have a demo application which is sending traces to grafana-tempo. I can confirm I'm receiving traces.
The issue I have observed is that exactly after 30m, my ingester pods are going into Back-off restarting state. And I have to manually restart its statefulset.
While searching the root cause, found that their is one parameter max_block_duration
which has a default value of 30m: "max_block_duration
: maximum length of time before cutting a block."
So I tried to increase the timing, and given value 60m. Now after 60 minutes my ingester pods are going into Back-off restarting state.
I have also enabled autoscaling. But no new pods are coming up if all ingester pods are in the same error state.
Can someone help me out to understand why its happening like this and the possible solution to eleminate the issue?
What value should be passed to max_block_duration
so that this pods will not so in Back-off restarting?
I expect my Ingester pods should run fine every time.
I also opened a github issue on tempo. And now this issue no more exist at my end. If someone is also facing same, you can have a look into my github issue to get some more insights : https://github.com/grafana/tempo/issues/2488