Search code examples
matlabparallel-processingglobal-variablespersistent-storageapp-data

Is appdata shared between workers in a parallel pool?


I'm working on a complicated function that calls several subfunctions (within the same file). To pass data around, the setappdata/getappdata mechanism is used occasionally. Moreover, some subfunctions contain persistent variables (initialized once in order to save computations later).

I've been considering whether this function can be executed on several workers in a parallel pool, but became worried that there might be some unintended data sharing (which would otherwise be unique to each worker).

My question is - how can I tell if the data in global and/or persistent and/or appdata is shared between the workers or unique to each one?

Several possibly-relevant things:

  1. In my case, tasks are completely parallel and their results should not affect each other in any way (parallelization is done simply to save time).
  2. There aren't any temporary files or folders being created, so there is no risk of one worker mistakenly reading the files that were left by another.
  3. All persistent and appdata-stored variables are created/assigned within subfunction of the parfor.

I know that each worker corresponds to a new process with its own memory space (and presumably, global/persistent/appdata workspace). Based on that and on this official comment, I'd say it's probable that such sharing does not occur... But how do we ascertain it?

Related material:

  1. This Q&A.
  2. This documentation page.

Solution

  • This is quite straightforward to test, and we shall do it in two stages.

    Step 1: Manual Spawning of "Workers"

    First, create these 3 functions:

    %% Worker 1:
    function q52623266_W1
    global a; a = 5;
    setappdata(0, 'a', a);
    someFuncInSameFolder();
    end
    

    %% Worker 2:
    function q52623266_W2
    global a; disp(a);
    disp(getappdata(0,'a'));
    someFuncInSameFolder();
    end
    

    function someFuncInSameFolder()
      persistent b; 
      if isempty(b)
        b = 10;
        disp('b is now set!');
      else
        disp(b);
      end
    end
    

    Next we boot up 2 MATLAB instances (representing two different workers of a parallel pool), then run q52623266_W1 on one of them, wait for it to finish, and run q52623266_W2 on the other. If data is shared, the 2nd instance will print something. This results (on R2018b) in:

    >> q52623266_W1
    b is now set!
    

    >> q52623266_W2
    b is now set!
    

    Which means that data is not shared. So far so good, but one might wonder whether this represents an actual parallel pool. So we can adjust our functions a bit and move on to next step.

    Step 2: Automatic Spawning of Workers

    function q52623266_Host
    
    spmd(2)
      if labindex == 1
        setupData();
      end
      labBarrier; % make sure that the setup stage was executed.
      if labindex == 2
        readData();
      end  
    end
    
    end
    
    function setupData
      global a; a = 5;
      setappdata(0, 'a', a);
      someFunc();
    end
    
    function readData
      global a; disp(a);
      disp(getappdata(0,'a'));
      someFunc();
    end
    
    function someFunc()
      persistent b; 
      if isempty(b)
        b = 10;
        disp('b is now set!');
      else
        disp(b);
      end
    end
    

    Running the above we get:

    >> q52623266_Host
    Starting parallel pool (parpool) using the 'local' profile ...
    connected to 2 workers.
    Lab 1: 
      b is now set!
    Lab 2: 
      b is now set!
    

    Which again means that data is not shared. Note that in the second step we used spmd, which should function similarly to parfor for the purposes of this test.