I have two general questions to the Stream Analytics behavior. I found nothing or(for me) missleading information, in the documentation about my questions.
Both of my questions are targeting a Stream Analytics with EventHub as input source.
1. Stream position
When the analytics job started, are only events processed that are incoming after startup? Are older events which are still in the event hub pipeline ignored?
2. Long span time window
In the documentation is written
"The output of the window will be a single event based on the aggregate function used with a timestamp equal to the window end time."
If I created a select statement with a, for example, 7 days tumbling window. Is there any limitation of how many output elements the job can hold in memory before closing the window and send out the result set? I mean on my heavy workload eventhub that can be millions of output results.
For your first question, there was not any evidence show that Stream Analytics will ignore any older events which before the job startup. Actually, the event lifecycle is depended on Event Hub Message Retention (1 ~ 7 days), not Stream Analytics. However, you can specify the eventStartTime
& eventEndTime
for a input to retrieve these data as you want, please see the first REST request properties of Stream Analytics Input.
On Azure portal, they are like as below.
For your second question, according to the Azure limits & quotas for Stream Analytics and the reference for Windowing
, there is not any limits written for memory usage, the only limits are as below.
These above will cause the output delay.