Watermark allows late arriving data to be considered for inclusion against already computed results for a period of time using windows. Its premise is that it tracks to a point in time before which it is assumed no more late events are supposed to arrive, but if they do, they are none-the-less discarded
.
Is there a way to store the discarded data, that can be used for reconciliation purpose later? Say In my Structured Streaming, I set the watermark to 1 hour. I am doing window operation for each 10 min and received a later event 20 min late. Is there a way I can store the discarded data say at a different location rather than discarding it?
No, there is no way to achieve this aspect.