Search code examples
matlabstatisticscross-validation

How to split a data into k-folds NOT randomly in matlab?


I have a dataset, for simplicity let's say it has 1000 samples (each is a vector).

I want to split my data for cross validation, for train and test, NOT randomly1, so for example if I want 4-fold cross validation, I should get:

fold1: train = 1:250; test= 251:1000
fold2: train = 251:500, test = [1:250 ; 501:1000]
fold3: train = 501:750, test = [1:500; 751:1000]
fold4: train = 751:1000, test = 1:750

I am aware of CVPARTITION, but AFAIK - it splits the data randomly - which is not what I need.

I guess I can write the code for it, but I figured there is probably a function I could use.


(1) The data is already shuffled and I need to be able to easily reproduce the experiments.


Solution

  • Here is a function that does it in general:

    function [test, train] = kfolds(data, k)
    
      n = size(data,1);
    
      test{k,1} = [];
      train{k,1} = [];
    
      chunk = floor(n/k);
    
      test{1} = data(1:chunk,:);
      train{1} = data(chunk+1:end,:);
    
      for f = 2:k
          test{f} = data((f-1)*chunk+1:(f)*chunk,:);
          train{f} = [data(1:(f-1)*chunk,:); data(f*chunk+1:end, :)];
      end
    end
    

    It's not an elegant 1 liner, but it's fairly robust, doesn't need k to be a factor of your number of samples, works on a 2D matrix and outputs the actual sets rather than indices.