azure azure-blob-storage azure-eventhub azure-stream-analytics

StreamAnalyticsJob Blob Output produces as many files as the number of partitions of input EventHub?

I have an EventHub(containing json formatted entity as events) which is input to my Stream Analytics Job. I have created 4 partitions on my EH but do not have any partition key so the data is distributed in RoundRobin theoretically.

My StreamAnlytics query is as simple as SELECT * FROM EventHub OUTPUT TO BLOB. The blob output has a configuration of data aggregation every 5mins and the file format is <date><HH>.

I see 4 files per hour on my blob storage, is this expected? Does SA internally reads each partition separately at the same time?

Sorry if it might sound naïve, I am new to this and curious to know how SA works internally.

Solution

Yes, this is expected.

A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.

For more details, please refer to Partitions in inputs and outputs.