Search code examples
c#memory-managementtask-parallel-librarytpl-dataflow

TPL: Dispose processed items


In C#, I am using Task Parallel Library (TPL) to download an image, process the image, and save the analysis results. A simplified code reads as the following.

var getImage = new TransformBlock<int, Image>(GetImage);
var proImage = new TransformBlock<Image, double>(ProcessImage);
var saveRes = new ActionBlock<double>(SaveResult);

var linkOptions = new DataflowLinkOptions() { PropagateCompletion = true };

getImage.LinkTo(proImage, linkOptions);
proImage.LinkTo(SaveRes, linkOptions);

for (int x = 0; x < 1000000; x++)
    getImage.Post(x);
getImage.Complete();

SaveRes.Completion.Wait();

This works as expected, except for memory usage. I am expecting int_x, image_x, and double_x to be disposed when the pipeline has processed that iteration. In other words, I am expecting every resource created during the execution of getImage, proImage, and saveRes for iteration x be disposed when the last block completes its execution. However, this implementation keeps all the objects in the memory until I exit the scope of TPL.

Am I missing something? is this the expected behavior of TPL? and is there any option to set so the resources are released at the end of each iteration?

Update

Following the suggestion in the comments, I rewrote the code using BufferBlock and SendAsync as the following. However, I do not think it leads to claiming the resources consumed by each task. Setting the BoundedCapacity only causes my program to halt at a point where I believe it has reached the limit set to the BoundedCapacity.

var blockOpts = new DataflowBlockOptions()
{ BoundedCapacity = 100 };

var imgBuffer = new BufferBlock<int>(blockOpts);

var getImage = new TransformBlock<int, Image>(GetImage, blockOpts);
var proImage = new TransformBlock<Image, double>(ProcessImage, blockOpts);
var SaveRes = new ActionBlock<double>(SaveResult, blockOpts);

var linkOptions = new DataflowLinkOptions() { PropagateCompletion = true };

imgBuffer.LinkTo(getImage, linkOptions);
getImage.LinkTo(proImage, linkOptions);
proImage.LinkTo(SaveRes, linkOptions);

for (int x = 0; x < 1000000; x++)
    await imgBuffer.SendAsync(x);
getImage.Complete();

SaveRes.Completion.Wait();

Solution

  • is this the expected behavior of TPL?

    Yes. It doesn't root all the objects (they are available for garbage collection and finalization), but it does not dispose them, either.

    and is there any option to set so the resources are released at the end of each iteration?

    No.

    how can I can make sure dispose is auto called when the last block/action executed on an input?

    To dispose objects, your code should call Dispose. This is fairly easily done by modifying ProcessImage or wrapping it in a delegate.

    If ProcessImage is synchronous:

    var proImage = new TransformBlock<Image, double>(image => { using (image) return ProcessImage(image); });
    

    or if it's asynchronous:

    var proImage = new TransformBlock<Image, double>(async image => { using (image) return await ProcessImage(image); });