I'm using Tensorflow.js in Node.js with an Nvidia CUDA-capable GPU (note that this is NOT Python), and I have implemented an AI model. I have a Dataset
object that represents the input data I would like to train my model on.
However, I would like to do an 80% - 20% split on my data, with 80% used for training, and 20% used for validation.
In the .fitDataset()
method, the validationData
setting is present for specifying validation data.
Unfortunately though, I have just a single Dataset object that represents my entire dataset.
Additionally, my training data is both temporal and extremely large - and my Dataset
object is backed by a Generator function. To this end, I'd like the last 20% of the the Dataset
object to act as my validation data.
What's the most efficient way to split a single Dataset
object in 2 without loading it all into memory such that I can use the last 20% of it as validation data?
The data is not all loaded in memory but iteratively. So a filtering can be applied on the loaded data to split into two datasets
// first load the dataset
const csvDataset = tf.data.csv(csvUrl);
//split dataset
let i = 0;
trainDataset = csvDataset.filter(x => i++%5 !== 0)
i = 0
testDataset = csvDataset.filter(x => i++%5 === 0)