Search code examples
c#.netweb-scrapingtpl-dataflow

Creating a TPL Dataflow TransformBlock that transforms single input into multiple outputs


I am developing a project based on the Dataflow Pattern. For that I am using TPL Dataflow library, from .NET.

I learned about this library very recently, so I am still a rookie with little knowledge. I am trying to build a pipeline of several blocks. The first block begins with a list of configuration interfaces on its input stack. Based on each configuration object, this block creates an object that will load a list of URLs from a file. After the URLs have been loaded, I would like to place each of them, individually, on the output stack for this block.

My problem is that I can't seem to find a way to have a transform function receive an input object and return a list of outputs that will be placed individually in the stack. Am I missing something here?

private async Task<Uri> LoadUrl(IUrlLoaderSettings loaderSettings)
{
    IUrlLoader newLoader = CreateSeedLoader(loaderSettings);
    List<Uri> urls = await newLoader.LoadAsync().ConfigureAwait(false);

    foreach (Uri url in urls)
    {
        // each url loaded should be posted on output stack.
    }

        return null;
}

// Url Loader block.
TransformBlock<IUrlLoaderSettings, Uri> loaderBlock = new TransformBlock<IUrlLoaderSettings, Uri>(loaderSettings => LoadUrl(loaderSettings));

Basically I want an input stack that has configuration objects, each object generates a list of outputs. I don't want that list to be placed directly in the output stack, because I want the next block to process each URL separately, not as a block.

Thank in advance!


Solution

  • You probably need a TransformManyBlock. This block invokes a Func<TInput, IEnumerable<TOutput>> for each input it receives, and propagates the produced items of each IEnumerable<TOutput> individually. The block implements the IPropagatorBlock<TInput, TOutput> interface.