Repost from users@apex.incubator.apache.org
Apex utilizes buffer server for back pressure. How does the buffer server survive application crashes? What if the buffer server itself dies? Will Apex guarantee that the downstream operator will eventually catch up with the upstream operator when the buffer server is brought back up?
Buffer server is a pub-sub mechanism within Apex platform that is used to stream data between operators. The buffer server always lives in the same container as the upstream operator (one buffer server per container irrespective of number of operators in container); and the output of upstream operator is written to buffer server. The current operator subscribes from the upstream operator's buffer server when a stream is connected.
So if an operator fails, the upstream operator's buffer server will have the required data state until a common checkpoint is reached. If the upstream operator fails, its upstream operator's buffer server has the data state and so on. Finally, if the input operator fails, which has no upstream buffer server, then the input operator is responsible to replay the data state. Depending on the external system, input operator either relies on the external system for replays or maintain the data state itself until a common checkpoint is reached.
If for some reason the buffer server fails, the container hosting the buffer server fails. So, all the operators in the container and their downstream operators are redeployed from last known checkpoint.