I have an EventHub(containing json formatted entity as events) which is input to my Stream Analytics Job. I have created 4 partitions on my EH but do not have any partition key so the data is distributed in RoundRobin theoretically.
My StreamAnlytics query is as simple as SELECT * FROM EventHub
OUTPUT TO BLOB
. The blob output has a configuration of data aggregation every 5mins and the file format is <date><HH>
.
I see 4 files per hour on my blob storage, is this expected? Does SA internally reads each partition separately at the same time?
Sorry if it might sound naïve, I am new to this and curious to know how SA works internally.
Yes, this is expected.
A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.
For more details, please refer to Partitions in inputs and outputs.