Search code examples
matlabdownsampling

Filter table rows Matlab by matching non-identical numbers as close as possible


This may have a simpler solution than my current approach, so any recommendations welcome!

I have a MATLAB table (38327x17) of char/double variables imported from .csv with readtable(), double converted from char. The double variable in col 2 is timepoints with a sample rate of around ~0.02 seconds, looks a bit like this:

char double char char char etc.  
x 1.0088 y w z  
x 1.0218 y w e  
x 1.1017 w y z  
x 1.1287 w q z  

I also have a vector of timepoints (238x1) with a sample rate of 2.531 seconds:

[0; 2.531; 5.062; 7.593; ...]

I want to filter the table by the vector, effectively downsampling the table timepoints i.e. extract 238 rows of the table by timepoints that best match each timepoint in the vector.

Desired outcome would be:

char double char char char etc.  
x 0 y w z  
x 2.7816 y w e  
x 5.0607 w y z  
x 7.6064 w q z  

(where the values are as close to the vector as possible, but they do not identically match due to mismatched sampling intervals). I tried resample() and downsample() but didn't get very far. I could iterate through the csv file line by line and string match, but seems inefficient.


Solution

  • You can minimize the "outer difference" (is there a better name for this?), in magnitude, between the two time vectors:

    % Create example data
    v=(0:60).'*2.531; %column vector
    M=(1:ceil(v(end)/0.49))*0.49;
    chars='a':'z';
    M=table( repmat('x',[numel(M) 1]), M.', chars(mod(randperm(numel(M)),numel(chars))+1).', ...
      'VariableNames', {'char1', 'double1', 'char2'} );
    
    % Find desired rows
    [~,idx]=min(abs( M.double1 - v.' )); %relies on implicit expansion
    M(idx,:),
    

    In your case, the temporary matrix with differences would be 38327 x 238, or about 70 MB, so this algorithm should be fine.