Search code examples
azureazure-blob-storageazure-eventhubazure-stream-analytics

Non-partitioned Stream Analytics Job output


In Azure I have an Event Hub with partition count 5 and a Stream Analytics Job which persists data from the hub to blob storage as is in json format. So now there are 5 files created to store incoming data.

Is it possible without changing hub partition to configure stream analytics job so it saves all the data to a single file?


Solution

  • After experimenting with partitioning suggested by this answer I found out that my goal can be achieved by changing Stream Analytics Job configuration.

    There are different compatibility levels for stream analytics jobs and the latest one at the moment (1.2) introduced automatic parallel query execution for input sources with multiple partitions:

    Previous levels: Azure Stream Analytics queries required the use of PARTITION BY clause to parallelize query processing across input source partitions.

    1.2 level: If query logic can be parallelized across input source partitions, Azure Stream Analytics creates separate query instances and runs computations in parallel.

    So when I changed compatibility level of the job to 1.1 it started to write all the output to a single file in a blob storage.