Search code examples
azurepartitioningazure-eventhubazure-java-sdk

azure-sdk-for-java eventhubs Partition has been lost


We recently deployed azure event-hub java receiver/listener client by following azure-docs.

I truly believe arrays starts with 0, but that has nothing to do with this question. So anyways, I observed the following error raised from processError & also processPartitionClose

Error occurred in partition14 - connectionId[MF_5fba9c_1636350888640] sessionName[eventhub-name/ConsumerGroups/consumer-group-name/Partitions/14] entityPath[eventhub-name/ConsumerGroups/consumer-group-name/Partitions/14] linkName[14_500701_1636350888641] Cannot create receive link from a closed session., errorContext[NAMESPACE: namespace.servicebus.windows.net. ERROR CONTEXT: N/A, PATH: eventhub-name/ConsumerGroups/consumer-group-name/Partitions/14]
ERROR  | Partition has been lost 14 reason LOST_PARTITION_OWNERSHIP

Question :

  1. Do azure-sdk-for-java-sdk-eventhubs reconnect on such partition lost automatically ?
  2. If NOT then what is the best practice before restarting manually ?
    • do I need to update the checkpoint manually ?
    • do I need to do anything on the ownership ?

This is our sdk setup with Sample Code

EventProcessorClientBuilder eventProcessorClientBuilder = new EventProcessorClientBuilder()
                .checkpointStore(new BlobCheckpointStore(blobContainerAsyncClient))
                .connectionString(getEventHubConnectionString(), getEventHubName())
                .consumerGroup(getConsumerGroup())
                .initialPartitionEventPosition(initialPartitionEventPosition)
                .processEvent(PARTITION_PROCESSOR)
                .processError(ERROR_HANDLER)
                .processPartitionClose(CLOSE_HANDLER);

 EventProcessorClient eventProcessorClient = eventProcessorClientBuilder.buildEventProcessorClient();
 // Starts the event processor
 eventProcessorClient.start();

 private final Consumer < ErrorContext > ERROR_HANDLER = errorContext->{
     log.error("Error occurred in partition" + errorContext.getPartitionContext().getPartitionId()
          + " - " + errorContext.getThrowable().getMessage());
 };

 private final Consumer < CloseContext > CLOSE_HANDLER = closeContext->{
     log.error("Partition has been lost " + closeContext.getPartitionContext().getPartitionId()
          + " reason " + closeContext.getCloseReason());

     EventContext lastContext = lastEvent.get();
     if (lastContext != null && (lastContext.getEventData().getSequenceNumber() % 10) != 0) {
         lastContext.updateCheckpoint();
     }
 };

jdk : 1.8

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-messaging-eventhubs-checkpointstore-blob</artifactId>
    <version>1.10.0</version>
</dependency>

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-messaging-eventhubs</artifactId>
    <version>5.10.1</version>
</dependency>

I did come across github-issue-15164 but could not find it anywhere mentioned.


Solution

  • Do azure-sdk-for-java-sdk-eventhubs reconnect on such partition lost automatically ?

    Yes, the EventProcessorClient in azure-messaging-eventhubs library will reconnect on such partitions. You don't need to change anything manually.

    If there are multiple instances of EventProcessorClients running and they all process events from the same Event Hub and use the same consumer group, then you see this LOST_PARTITION_OWNERSHIP error on one processor because the ownership of a partition might have been claimed by the other processor. The checkpoints are read from the checkpoint store (Storage Blob in your code sample above) and the processing resumes from the next sequence number.

    Please refer to partition ownership and checkpointing for more details.