Search code examples
azureazure-eventhubazure-stream-analytics

What happens to events in event hub after stream analytics does it works and routes them to service bus?


I have following scenario:

  1. The event hub (EH1) is configured with a retention policy of 7 days.
  2. Producers publish events to EH1.
  3. The events from EH1 are routed from stream analytics (SA) (after performing certain calculations over 1 hour time windows) to service bus, which gets both raw events (as messages) as well as summarized calculations.
  4. Lets say over 24 hour period of day 1, producers publish 1 million events to EH1.
  5. SA kicks in and routes the raw events as well as summarized calculations (over 1 hour periods) to service bus.
  6. Assume that after day 1, there are no events pushed to EH1 for next 15 days.

Questions:

  1. How long will the 1 million raw events (from day 1) stay in EH1?
  2. Will those 1 million raw events (from day 1) be still there on day 2 (after 1st hour) through day 7 (because the retention policy is 7)? Or will they be gone after day 1 when SA is done processing all those events? If neither, what else happens?
  3. What metrics should I look at in EH1 to prove what ever the answer is to both (1) and (2)?

Solution

  • First of all, you should take a look at the consumer group first.

    In short, when consumers(like any app or code which are used to receive events from eventhub) read events, it must read the events via a consumer group(we named it cg_1 here) -> then for the next time, you read events from cg_1 again, the events(which you have already read) will not be read again.

    But if you switch to another consumer group(like you newly create a consumer group named cg_2), you can read all the data(even though the data has been read from cg_1) again.

    So for your questions:

    #1: Since you have configured the retention policy of 7 days, the events(raw data) will be kept in eventhub for 7 days. If the events have been received via a consumer group, you cannot receive it again via this consumer group. But you can use another consumer group to receive the data again.

    #2: Similar to question 1, the raw events will be stored in eventhub according to the retention days you have configured.

    #There is no such metrics, but you can easily write client codes, and create a new consumer group, then read the data to check if it's there.