I have a long signal with some intervals with NaNs. When I using matlab function fillmissing
with fill the NaNs, there are still NaNs left.
signal = rand(10000, 1);
signal(500: 700) = NaN;
signalInterpolate = fillmissing(signal, "movmean", 50);
sum(isnan(signalInterpolate))
signalInterpolate = fillmissing(signal, "movmean", 100);
sum(isnan(signalInterpolate))
signalInterpolate = fillmissing(signal, "movmean", 200);
sum(isnan(signalInterpolate))
signalInterpolate = fillmissing(signal, "movmean", 202);
sum(isnan(signalInterpolate))
The result is:
ans =
152
ans =
102
ans =
2
ans =
0
I increase the window size, and then there are no NaNs in the result. It seems the window size needs to be larger than the max length of the continuous NaNs in the signal. Is there a way that I can avoid NaNs with the smaller window size?
From the docs
F = fillmissing(A,movmethod,window)
fills missing entries using a moving window mean or median with window length window. For example,fillmissing(A,'movmean',5)
fills data with a moving mean using a window length of 5.
You have created example data with 201 consecutive NaN values.
When you use "movmean", 202
there is always at least one non-NaN value in every moving window, so every moving average window value is non-NaN and all NaN values can be overridden with that value.
However, any smaller spans will have windows where every value is NaN, and MATLAB can't infer what value you expect to fill missing with
Imagine a smaller example with 4 consecutive NaNs:
[1, 2, NaN, NaN, NaN, NaN, 3, 4, 5]
"movmean", 3: ___________ <- avg of span of 3 around first NaN = [2,NaN,NaN]
____________ <- span of 3 around second NaN = [NaN,NaN,NaN]
[1, 2, 2, NaN, NaN, 3, 3, 4, 5]
You either need to define what these values should be by using one of the other options in fillmissing
, or you could repeatedly fill using a moving mean and a smaller window (since the number of consecutive NaNs would decrease each time by the window size) - this will result in something a bit like using nearest
instead of movmean
with some smoothing in the middle.
signal = rand(1,1e3)*0.1 + linspace(0,1,1e3); % Create some dummy data
signal(400:600) = NaN;
signal(500:end) = signal(500:end) + 0.3; % introduce a step change during the NaNs
% repeated in-fill of missing values
while nnz(isnan(signal)) > 0
signal = fillmissing( signal, 'movmean', 25 );
end
Here's a comparison with just using nearest
, which will be faster and arguably makes more sense as you're "inventing" less data