I was hoping for some guidance on how to use the EventProcessorHost with a worker role. Basically I am hoping to have the EventProcessorHost process the partitions in parallel and I'm wondering where I should go about placing this type of code within the worker role and if I'm missing anything key.
var manager = NamespaceManager.CreateFromConnectionString(connectionString);
var desc = manager.CreateEventHubIfNotExistsAsync(path).Result;
var client = Microsoft.ServiceBus.Messaging.EventHubClient.CreateFromConnectionString(connectionString, path);
var host = new EventProcessorHost(hostname, path, consumerGroup, connectionString, blobStorageConnectionString);
EventHubProcessorFactory<EventData> factory = new EventHubProcessorFactory<EventData>();
host.RegisterEventProcessorFactoryAsync(factory);
Everything I've read says the EventProcessorHost will divide up the partitions on its own, but is the above code sufficient to process all the partitions asynchronously?
Here's a simplified version of how we process our event hub from an Worker Role. We keep the instance in the mainWorker role and call the IEventProcessor to start processing it.
This way we can call it and close it down when the Worker Responds to shutdown events etc.
EDIT:
As for the processing it in parallel, the IEventProcessor class will just grab 10 more events from the event hub when it's finished processing the current one. Handling all the fancy partition leasing for you.
It's a synchronous workflow, When I scale to multiple worker roles I start to see the partitions get split between instances and it gets faster etc. You'd have to roll your own solution if you wanted it to process the event hub in a different way.
public class WorkerRole : RoleEntryPoint
{
private readonly CancellationTokenSource _cancellationTokenSource = new CancellationTokenSource();
private readonly ManualResetEvent _runCompleteEvent = new ManualResetEvent(false);
private EventProcessorHost _eventProcessorHost;
public override bool OnStart()
{
ThreadPool.SetMaxThreads(4096, 2048);
ServicePointManager.DefaultConnectionLimit = 500;
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
var eventClient = EventHubClient.CreateFromConnectionString("consumersConnectionString",
"eventHubName");
_eventProcessorHost = new EventProcessorHost(Dns.GetHostName(), eventClient.Path,
eventClient.GetDefaultConsumerGroup().GroupName,
"consumersConnectionString", "blobLeaseConnectionString");
return base.OnStart();
}
public override void Run()
{
try
{
RunAsync(this._cancellationTokenSource.Token).Wait();
}
finally
{
_runCompleteEvent.Set();
}
}
private async Task RunAsync(CancellationToken cancellationToken)
{
// starts processing here
await _eventProcessorHost.RegisterEventProcessorAsync<EventProcessor>();
while (!cancellationToken.IsCancellationRequested)
{
await Task.Delay(TimeSpan.FromMinutes(1));
}
}
public override void OnStop()
{
_eventProcessorHost.UnregisterEventProcessorAsync().Wait();
_cancellationTokenSource.Cancel();
_runCompleteEvent.WaitOne();
base.OnStop();
}
}
I have multiple processors for the specific partitions (you can guarantee FIFO this way), but you can implement you're own logic easily i.e. skip the use of a EventDataProcessor class and Dictionary lookup in my example and just implement some logic within the ProcessEventsAsync method.
public class EventProcessor : IEventProcessor
{
private readonly Dictionary<string, IEventDataProcessor> _eventDataProcessors;
public EventProcessor()
{
_eventDataProcessors = new Dictionary<string, IEventDataProcessor>
{
{"A", new EventDataProcessorA()},
{"B", new EventDataProcessorB()},
{"C", new EventDataProcessorC()}
}
}
public Task OpenAsync(PartitionContext context)
{
return Task.FromResult<object>(null);
}
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach(EventData eventData in messages)
{
// implement your own logic here, you could just process the data here, just remember that they will all be from the same partition in this block
try
{
IEventDataProcessor eventDataProcessor;
if(_eventDataProcessors.TryGetValue(eventData.PartitionKey, out eventDataProcessor))
{
await eventDataProcessor.ProcessMessage(eventData);
}
}
catch (Exception ex)
{
_//log exception
}
}
await context.CheckpointAsync();
}
public async Task CloseAsync(PartitionContext context, CloseReason reason)
{
if (reason == CloseReason.Shutdown)
await context.CheckpointAsync();
}
}
Example of one of our EventDataProcessors
public interface IEventDataProcessor
{
Task ProcessMessage(EventData eventData);
}
public class EventDataProcessorA : IEventDataProcessor
{
public async Task ProcessMessage(EventData eventData)
{
// Do Something specific with data from Partition "A"
}
}
public class EventDataProcessorB : IEventDataProcessor
{
public async Task ProcessMessage(EventData eventData)
{
// Do Something specific with data from Partition "B"
}
}
Hope this helps, it's been rock solid for us so far and scales easily to multiple instances