Search code examples
matlabmatlab-structpre-allocation

How can I preallocate a non-numeric vector in MATLAB?


I've often found myself doing something like this:

unprocessedData = fetchData();  % returns a vector of structs or objects
processedData = [];             % will be full of structs or objects

for dataIdx = 1 : length(unprocessedData) 
    processedDatum = process(unprocessedData(dataIdx));
    processedData = [processedData; processedDatum];
end

Which, whilst functional, isn't optimal - the processedData vector is growing inside the loop. Even mlint warns me that I should consider preallocating for speed.

Were data a vector of int8, I could do this:

% preallocate processed data array to prevent growth in loop
processedData = zeros(length(unprocessedData), 1, 'int8');

and modify the loop to fill vector slots rather than concatenate.

is there a way to preallocate a vector so that it can subsequently hold structs or objects?


Update: inspired by Azim's answer, I've simply reversed the loop order. Processing the last element first forces preallocation of the entire vector in the first hit, as the debugger confirms:

unprocessedData = fetchData();

% note that processedData isn't declared outside the loop - this breaks 
% it if it'll later hold non-numeric data. Instead we exploit matlab's 
% odd scope rules which mean that processedData will outlive the loop
% inside which it is first referenced: 

for dataIdx = length(unprocessedData) : -1 : 1 
    processedData(dataIdx) = process(unprocessedData(dataIdx));
end

This requires that any objects returned by process() have a valid zero-args constructor since MATLAB initialises processedData on the first write to it with real objects.

mlint still complains about possible array growth, but I think that's because it can't recognise the reversed loop iteration...


Solution

  • Since you know the fields of the structure processedData and you know its length, one way would be the following:

    unprocessedData = fetchData();
    processedData = struct('field1', [], ...
                           'field2', []) % create the processed data struct
    processedData(length(unprocessedData)) = processedData(1); % create an array with the required length
    for dataIdx = 1:length(unprocessedData)
        processedData(dataIdx) = process(unprocessedData(dataIdx));
    end
    

    This assumes that the process function returns a struct with the same fields as processedData.