Search code examples
matlab

trying to understand how Matlab's fillmissing() processes gaps when using custom fillfun


I was using Matlab's fillmissing function to replace missing values in some data using a custom function, and I was running into trouble filling data at the locations that I thought should be considered 'endvalues' by the function. It's fairly clear from the documentation that the fillfun method uses a different moving-window definition than the movmean or movmedian methods, as they define the window on either side of each gap element, while fillfun processes the window on either side of each full gap.

That said, looking at the following test array, A, that includes several combinations of missing values that should be addressed as individual columns:

>> A = reshape(1:20,5,4);
>> A([1,7,8,9,11:15,20])=NaN
A =
NaN 6 NaN 16
2 NaN NaN 17
3 NaN NaN 18
4 NaN NaN 19
5 10 NaN NaN

Using the simple fill methods things are straightforward:

>> fillmissing(A, 'constant', 0)
ans =
     0     6     0    16
     2     0     0    17
     3     0     0    18
     4     0     0    19
     5    10     0     0
>> fillmissing(A, 'constant', 0, 'endvalues',99)
ans =
    99     6    99    16
     2     0    99    17
     3     0    99    18
     4     0    99    19
     5    10    99    99

Now if I just use a simple test function: @(x,y,z) z which should fill the entire gap with the SamplePoints position (default [1 2 3 4 5]), that's when behavior gets odd:

>> fillmissing(A, @(x,y,z) z, 2)
ans =
     1     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5     5

>> fillmissing(A, @(x,y,z) z, 2, 'endvalues',99)
ans =
    99     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5    99

It seems what are considered 'endvalues' are not equivalent between methods. Further, it seems fillmissing arbitrarily excludes endvalues (consistent with the above caveats) if the window has no included points, whether or not the function is defined to fill those values:

>> fillmissing(A, @(x,y,z) z, [2 0])
ans =
   NaN     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5     5

>> fillmissing(A, @(x,y,z) z, [0 2])
ans =
     1     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5   NaN

>> fillmissing(A, @(x,y,z) z, [0 2], 'endvalues', 99)
ans =
    99     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5    99

>> fillmissing(A, @(x,y,z) z, [0 2], 'endvalues', 'extrap')
ans =
     1     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5   NaN

Last, the interaction with 'SamplePoints' seems unclear.

>> fillmissing(A, @(x,y,z) z, 2, 'SamplePoints', [1 2 3 4 5])
ans =
     1     6     1    16
     2     2     2    17
     3     3     3    18
     4     4     4    19
     5    10     5     5

>> fillmissing(A, @(x,y,z) z, 2, 'SamplePoints', [1 2 3 4 5]+10)
ans =
    11     6     1    16
     2    12     2    17
     3    13     3    18
     4    14     4    19
     5    10     5    15

That appears to maybe be a bug in the way SamplePoints is handled. Can anyone clarify expected behavior? Am I missing something? If there is clearer documentation somewhere for this function method I would appreciate any pointers.

(Tested using Matlab 2021b if that matters. Update: verified above behavior persists in Matlab 2022b.)

(Edit: see answer below with updates showning function output changes tested in Matlab 2023a.)


Solution

  • It appears that as of Matlab 2023a, the inconsistencies posted above when using a custom fillfun with fillmissing have been resolved. See below:

    the current output of the same commands now produces:

    no changes to the simple cases:

    >> fillmissing(A, 'constant', 0)
    ans =
         0     6     0    16
         2     0     0    17
         3     0     0    18
         4     0     0    19
         5    10     0     0
    >> fillmissing(A, 'constant', 0, 'endvalues',99)
    ans =
        99     6    99    16
         2     0    99    17
         3     0    99    18
         4     0    99    19
         5    10    99    99
    

    the fillfun method now consistently sees an empty column as endvalues the same as above:

    >> fillmissing(A, @(x,y,z) z, 2)
    
    ans =
    
         1     6     1    16
         2     2     2    17
         3     3     3    18
         4     4     4    19
         5    10     5     5
    
    >> fillmissing(A, @(x,y,z) z, 2, 'endvalues',99)
    
    ans =
    
        99     6    99    16
         2     2    99    17
         3     3    99    18
         4     4    99    19
         5    10    99    99
    

    And endvalues appear to be consistently accounted for with ranges:

    >> fillmissing(A, @(x,y,z) z, [2 0])
    
    ans =
    
         1     6     1    16
         2     2     2    17
         3     3     3    18
         4     4     4    19
         5    10     5     5
    
    >> fillmissing(A, @(x,y,z) z, [0 2])
    
    ans =
    
         1     6     1    16
         2     2     2    17
         3     3     3    18
         4     4     4    19
         5    10     5     5
    
    >>  fillmissing(A, @(x,y,z) z, [0 2], 'endvalues', 99)
    
    ans =
    
        99     6    99    16
         2     2    99    17
         3     3    99    18
         4     4    99    19
         5    10    99    99
    
    >> fillmissing(A, @(x,y,z) z, [0 2], 'endvalues', 'extrap')
    
    ans =
    
         1     6     1    16
         2     2     2    17
         3     3     3    18
         4     4     4    19
         5    10     5     5
    

    and the SamplePoints interaction is also consistently using the actual value rather than the place-value:

    >> fillmissing(A, @(x,y,z) z, 2, 'SamplePoints', [1 2 3 4 5])
    
    ans =
    
         1     6     1    16
         2     2     2    17
         3     3     3    18
         4     4     4    19
         5    10     5     5
    
    >> fillmissing(A, @(x,y,z) z, 2, 'SamplePoints', [1 2 3 4 5]+10)
    
    ans =
    
        11     6    11    16
         2    12    12    17
         3    13    13    18
         4    14    14    19
         5    10    15    15