Search code examples
rsplittraining-data

How to split data into train set (and test set) every nrows in R?


I've got a classification problem where I have a huge DATASET containing 308.500 data. I want to split these data into a train set and a test set in order to create a model.

But I want the train data to take, for example, sample for the DATASET every nrows, for example every 1.000 rows, so I know that the train set will be constructed by rows from all the DATASET. Is there a way to do this?

For example I'd like something like this:

train = DATASET[take sample every 1000 rows]

Solution

  • You can use seq to create indices of rows to subset.

    train_inds <- seq(1, nrow(DATASET), 1000)
    train <- DATASET[train_inds, ]
    test <- DATASET[-train_inds, ]