Search code examples
azureazure-blob-storageazure-eventhubazure-stream-analytics

StreamAnalyticsJob Blob Output produces as many files as the number of partitions of input EventHub?


I have an EventHub(containing json formatted entity as events) which is input to my Stream Analytics Job. I have created 4 partitions on my EH but do not have any partition key so the data is distributed in RoundRobin theoretically.

My StreamAnlytics query is as simple as SELECT * FROM EventHub OUTPUT TO BLOB. The blob output has a configuration of data aggregation every 5mins and the file format is <date><HH>.

I see 4 files per hour on my blob storage, is this expected? Does SA internally reads each partition separately at the same time?

Sorry if it might sound naïve, I am new to this and curious to know how SA works internally.


Solution

  • Yes, this is expected.

    A Stream Analytics job can consume and write different partitions in parallel, which increases throughput.

    For more details, please refer to Partitions in inputs and outputs.