Search code examples
machine-learningcross-validationorange

In the Orange data mining toolkit, how do I specify groups for cross-validation?


I'm using the Orange GUI, and trying to perform cross-validation. My data has 8 different groups (specified by a variable in the input data), and I'd like each fold to hold out a different group. Is this possible to do using Orange? I can select the number of folds for cross-validation, but I don't see any way of determining which data is in each one.


Solution

  • Cross-validation does random sampling. I don't think what you seek is possible out of the box.

    If you really want to have it honor the splits you made beforehand (according to some input variable), and you aren't afraid of some manual labor, you can use Select Rows widget to select the rows of one group (i.e. Matching Data), pass that into Test & Score as Test Data, and have all the rest of data (i.e. Unmatched Data) as training Data. This way, you get the cross-validation for a single fold (group). Repeat, and finally average, to obtain results for all folds.

    If you know some Python, there's always Orange scripting layer you can fall back to.