I am working with biological data, genes, that have multiple characteristics which I want to have reflected properly in my training and test data.
However, the initial_split function only accepts one strata. Is there a good way to create an initial split of my data using multiple strata? Preferably using tidymodels / tidyverse.
Thank you!
You would have to make a composite column to stratify on. We've confined the strata to one column on purpose; the resulting sample sizes can get very small and you may not be able to stratify.
Another approach that you can use (that I will eventually add a PR for) is to use twinning (corresponding R package).
If you still want an initial_split
object, you can make one using rsample::make_splits
using the results of the twinning results.