I've got a classification problem where I have a huge DATASET containing 308.500 data. I want to split these data into a train set and a test set in order to create a model.
But I want the train data to take, for example, sample for the DATASET every nrows, for example every 1.000 rows, so I know that the train set will be constructed by rows from all the DATASET. Is there a way to do this?
For example I'd like something like this:
train = DATASET[take sample every 1000 rows]
You can use seq
to create indices of rows to subset.
train_inds <- seq(1, nrow(DATASET), 1000)
train <- DATASET[train_inds, ]
test <- DATASET[-train_inds, ]