Search code examples
matlabtime-seriesmissing-dataimputation

How to make Matlab fillmissing function impute only a certain number of missing values between known values?


Let's consider this code only for exemplification purpose:

A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
dates = datetime({'2010','2011','2012','2013','2014'},'InputFormat','yyyy')';
TT = array2timetable(A,'RowTimes',dates);

The resulting timetable is: starting timetable with undesired missing values

I would like to use the matlab function fillmissing to impute missing data according to the following rules:

  • missing data at the beginning of the time series should not be imputed
  • missing data at the end of the time series should not be imputed
  • missing data within known values should be imputed only if the number of missing values between known values is strictly minor than 2

The resulting timetable should be: the final table with desired missing data imputation

Notice that only the 4th row in the column A2 has been imputed here. Can I do that with fillmissing? Otherwise how can I do that?


Solution

  • You can find the first and last non-NaN values using find. Based on these indicies, you can conditionally fill missing data if there are fewer than 2 missing values. For some vector v:

    idxNaN = isnan( v ); % Get indicies of values which are NaN
    idxDataStart = find( ~idxNaN, 1, 'first' ); % First NaN index
    idxDataEnd =   find( ~idxNaN, 1, 'last' );  % Last NaN index
    idxData = idxDataStart:idxDataEnd;          % Indices of valid data
    numValsMissing = nnz( idxNaN(idxData) );    % Number of NaNs in valid data
    if numValsMissing < 2 % Check for max number of NaNs
        v(idxData) = fillmissing(v(idxData));   % Fill missing on this data
    end
    

    For your array A you can loop over the columns and apply the above, where each column is a vector v.

    A = [NaN NaN NaN NaN 9; NaN NaN 2 5 7; NaN 3 4 NaN 9; 11 NaN 12 NaN 14; 44 5 15 12 nan];
    
    for ii = 1:size(A,2)
        v = A(:,ii);
        idxNaN = isnan( v );
        idxDataStart = find( ~idxNaN, 1, 'first' );
        idxDataEnd =   find( ~idxNaN, 1, 'last' );
        idxData = idxDataStart:idxDataEnd;
        numValsMissing = nnz( idxNaN(idxData) );
        if numValsMissing < 2
            v(idxData) = fillmissing(v(idxData),'linear');
        end
        A(:,ii) = v;
    end