Search code examples
asp.net-coretpl-dataflow

How to integrate TPL Dataflow in ASP NET Core


Hello have looked with intense interest lately to TPL Dataflow and i want to integrate it in my ASP .NET Core application. I want to use it as a pipeline where multiple methods from different parts of application can post data to this DataFlow chain. What i do not know is where do you store your link of blocks in case you want them to be called from multiple places?

Producer

public class Producer
{
    private BufferBlock<int> startBlock{get;}
    private ActionBlock<int> ioBlock{get;}
    private IOService service;

    private void InitializeChain()
    {
       this.startBlock=new BufferBlock<int>();
       var transformLink=new TransformBlock<int,string>([something]);
       // some chain of blocks here 
       this.ioBlock=new ActionBlock<int>(async(x)=>await this.service.WriteAsync(x));
       this.startBlock.LinkTo([someBlock]).LinkTo([someOtherBlock])......LinkTo(ioBlock);
    }
    public async Task AddAsync(int data)
    {
        this.BufferBlock.Post(data);
    }
    public Producer(IOService service)
    {
        this.service=service;
        this.InitializeChain();
    }
}

API Producers
I am envisioning this Producer getting called from multiple parts of my application , well use Controller-s for brevity:

public class C1:Controller
{
    private Producer producer;
    [HttpPost]
    [Route([someroute])
    public async Task SomeRoute(int data)
    {
        await  this.producer.AddAsync(data);
    }
    [HttpGet]
    [Route([someotherroute])
    public async Task SomeOtherRoute(int data)
    {
        await  this.producer.AddAsync(data);
    }
    public C1(Producer producer)
    {
      this.producer=producer;
    }
}

Startup

  public void ConfigureServices(IServiceCollection services) {
           services.AddSingleton<Producer>();
  }

This can be extended to a multiple Controller scenario or deeper in the hierarchy.

Now my question would be:
How should the Producer that keeps the Dataflow chain be injected ? Should it be transient ? Should the Blocks be instantiated on every call ?

I do not know if this design is ok.I know TPL Dataflow is threadsafe , but can it be used this way?

P.S I basically do not know in what form to keep my Dataflow pipeline and its lifetime , if i want it to be available per the entire scope of my ASP NET Core application. I want to fetch data from multiple endpoints (directly or deeper in the call hierarchy) , batch them ,transform them , and control the way they are in the end written to an external source (async operation). Does this play nice with the already existing ThreadPool of ASP NET Core ?

P.S 2: This question also haunts me for an Rx equivalent.


Solution

  • I recommend not directly linking your controllers to your background processor. For reliability reasons, there should be a persistent queue in between them. This can be an Azure Queue, Amazon Simple Queue, or even something old school like MSMQ or a database.

    Your processor can be independent (Azure Function, Amazon Lambda, or old school like Win32 service), or it can be part of your web app (ASP.NET Core hosted service).

    Your controller writes to the persistent queue and then returns. Your processor then reads messages from the queue and processes them. Your processor is what would use TPL Dataflow or Rx - whichever is more natural.