I am randomly getting the "Job Failed" status on an asynchronous job consecutively. I have an exponential backoff of 5x and it usually succeeds at the 5th time. There is one application that is hitting the API for up to 8 hours continuously for various access tokens, if it is relevant.
What could be the cause for this?
Ideally, it should timeout and not run for so long, try using different date ranges if your account has too much data.
When you are hitting for the report, you should also check the percentage completion which will give you more details on that.