Search code examples
matlabparfor

Referring to class method in parfor loop: significant memory usage


Example code of a class:

classdef testcls
    methods
        function sayhello(~)
            disp('Hello! ')
        end
    end 
end

and now if I call the method in parfor as below

A = testcls;
parfor ii = 1:4
    A.sayhello()
end

Mlint tells me a performance issue on the usage of A in the loop:

The entire array or structure 'obj' is a broadcast variable. This might result in unnecessary communication overhead.

And I can suppress this message by using anonymous function:

A = testcls;
f = @A.sayhello;
parfor ii = 1:4
    f()
end

But my question is, will doing this help with the speed in anyway? Is there any better way to call a method in parfor?

Then, will the case get more complicated if I want to set up input/output arguments of the function?

classdef testcls
    methods
        function [out1,out2] = sayhello(~,n)
            out1 = (['Hello! ', num2str(n)]);
            out2 = n;
        end
    end
end
    
A = testcls;
f = @A.sayhello;
[a,b] = deal(cell(4,1));
parfor ii = 1:4
    [a{ii},b{ii}] = feval(f,ii);
end

EDIT:

I have observed significant resource consumption related to memory copy operations. Basically the job dispatcher will create an identical object for each worker, including all modified properties.

The f = @A.sayhello; usage does not save Matlab from memcpy-ing the entire object to every individual worker, even when the method itself does not call or store any class property.

I think this is the way to ensure transparency. However when the amount of data is huge this will become a big pain in the head.

Is there a way, instead of isolating the desired function into a standalone file-based function, of packaging sayhello in the object that will not invoke memcopying of the entire object?


EDIT: Thanks to @gnovice for the suggestive answer. I have made a test case in order to compare parfor with static method, parfor with non-static method, and serial execution using arrayfun.

Test case 1: parfor with non-static method (control)

parfor non-static

As can be seen in the memory usage record, the creation of a single object testcls uses ~700MB RAM, indicated by label 1, which is followed by a clear command labeled as 2, and the parfor loop runs above label 3. The peak usage by parfor is approximately 4 times as a single object, while the pool has 4 workers.

Test case 2: parfor with static method

parfor static

The test procedure is done and labeled in the same way. From this evidence, the conclusion would be that only making the method static does not prevent the parpool from spawning identical objects for all workers.

Test case 3: Serial evaluation using arrayfun

serial

Since arrayfun performs a non-sequential serial batch evaluation, here is no reason for arrayfun to use more memory than needed by a single thread. Hence the evidence.

Example code:

classdef testcls
    properties
        D
    end
    methods (Static = false)
        function [out1,out2] = sayhello(~,n)
            out1 = (['Hello! ', num2str(n)]);
            out2 = n;
        end
    end
    methods
        function obj = testcls(~)
            obj.D = rand(1e8,1);
        end
    end
end

To run the test, use this script:

clear;clc;close all

A = testcls;
f = @A.sayhello;
parfor ii = 1:4
    feval(f,ii)
end

You may replace the parfor with arrayfun for serial validation.


Edit 2024/5/29: Thanks @CrisLuengo for the comment.

In R2020a a new feature was introduced that supports thread pools instead of the traditional process pools. Theoretically with thread workers, memory copy would be unnecessary since threads share the memory space of the same process.

More documentation explains differences between thread and process pools. However, one thing they did not discuss is whether your task is compute-intensive or IO-intensive. I have recently spent more time on Python programming where it makes a big difference, so it's what I care about and would dive into if I have had free time. If you run compute-intensive code on a thread pool, it may not give you speed benefit, as a single process can only execute on a single core at any time. This is true in Python due to GIL, but may not be the same in Matlab, as the underlying OS does not have this limit. - some help with running some tests would be appreciated.


Solution

  • For methods that don't have to reference any property of the class, it's probably best to make them static methods. From the documentation:

    Static methods are associated with a class, but not with specific instances of that class. These methods do not require an object of the class as an input argument, unlike ordinary methods which operate on specific objects of the class. You can call static methods without creating an object of the class

    Since they can be called without having to create an object of that class, this should help you avoid the unnecessary duplication of the entire object across each worker.

    Example method:

    classdef testcls
      ...
      methods(Static)
        function sayhello
          disp('Hello!');
        end
      end
      ...
    end
    

    And to call it from each worker:

    testcls.sayhello();