Search code examples
.netazureazure-data-lakeazure-eventhubazure-data-lake-gen2

Azure event hub capture multiple events in a single file


We are planning on using Azure event hub. Our app is sending events to Azure Event hub (one event at a time). App does not specify any partition. We enabled Capture to write the data to Data Lake Storage Gen 2.

Events are written to datalakestorage gen2 as single avro file when capture is enabled. is it possible to write events occurred in a time frame as a single file (csv or avro)? Will Is it better to write each event as a single file or bulk events in a single file?


Solution

  • is it possible to write events occurred in a time frame as a single file (csv or avro)?

    It depends on how many partitions being used in the eventhub. Each partition captures independently and writes a completed block blob at the time of capture.

    So if these events are only sent to 1 partition(for example, your eventhub only has 1 partition or you use your code to control events sent to specified partition), then in a time frame, only 1 avro file is created.

    If events are distributed among partitions in a round-robin fashion(this is the default behavior), then in a time frame, the number of avro file created will be same as the number of partitions.

    Will Is it better to write each event as a single file or bulk events in a single file?

    bulk events in a single file would be better due to less storage cost. But it depends on how many events you're sending during a specified time window or size window for capture. For example, if the time window for capture is 5 minutes, and in these 5 minutes you only send 1 event, then only one file will be created.