I would like to find the best hyperparameters for my model, but tuning 6 metaparameters over a total of 486 permutations and 200k documents takes a while. That's why I'm thinking about using the free credits on AWS. Ideally I want to run my script and get a .csv file as ouput.
vector_size = [100, 200, 300]
window = [2, 5, 10]
epochs = [10, 20, 30]
count =[2, 5, 10]
dm = [0,1]
sample = [10e-4, 10e-5, 10e-6 ]
The problem is that I've never used AWS and the amount of different services is overwhelming. Can you guys give me a hint which service is suitable for my problem?
EC2 is one of the original core services that gives you a virtual system in the cloud, with a variety of CPU/RAM options, to run anything you want. You could, with effort, fire up 468 nodes to train & evaluate each model in parallel, saving aside the results, shutting down each node as soon as its run finishes.
(There might be a newer higher-level service which offers some other sort of assistance with job-management, but EC2 is the original generic node-in-the-cloud.)
Another thought for your meta-optimization:
Overdoing epochs
shouldn't ever hurt - it'll just be wasteful. So you could just do the big test with your largest value, epochs=30
, and be fairly confident that the other parameters that are best, with that maxed value, won't improve much with fewer epochs
.
(But, especially if you need to re-run the job often, 30 might only be marginally better than some smaller epochs count - so you could then separately run a test to balance time/cost and evaluation quality.)