python keras neural-network cross-validation

How is Cross Validation performed and how GridSearchCV() specifically?

How is GridSearchCV() (and or RandomizedSearchCV()) implemented in scikit? I wonder about the following: When using one of these techniques, how are the following aspects taken into account:

validation set
model selection
hyperparameter tuning
prediction

? Here is a picture that summarizes my confusion:

What happens when and how often? Maybe to keep it simple, let's assume a single neural network acting as our model. My understanding so far:

In the first iteration, the model is fit on the training fold, separated into different folds. Here I struggle already: Is the model trained on a single fold and then tested on the validation fold? What happens then with the next fold? Does the model keep the weights achieved by its first training fold or will it re-initialize for the next training fold?

To be more precise: In the first iteration, is the model fit four times and tested four times on the validation set, independently between all folds?

When the next iteration begins, the model keeps no information from the first iteration, right? Thus, are all iterations and all folds are independent from each other? How are the hyperparameters tuned here?

In above example, there are 25 folds in total. Is the model with a constant set of hyperparameters fit and tested 20 times? Let's say, we have two hyperparameters to tune: Learning rate and dropout rate, both with two levels:

learning_rate = [0.3, 0.6] and
dropout_rate = [0.4, 0.8].

Will the neural net now fitted 80 times? And when having not only a single model but e.g. two models (neural network and random forest), the whole procedure will be performed twice?

Is there a possibility to see how many folds GridSearchCV() will consider?

I have seen Does GridSearchCV perform cross-validation? , Model help using Scikit-learn when using GridSearch and scikit-learn GridSearchCV with multiple repetitions but I can't see a clear and precise answer to my questions.

Solution

So the k-folds method:

you split your training set into n parts (k folds) for example 5. You take de first part as the validation set and the 4 other parts as the training set. You train and this gives you a training/CV performance. You do this 5 (number of folds) times, each folds become the validation set and the others de training set. At the end you do the mean of the performances to obtain the cv performance of your model. This is for the k-fold.

Now, GridSearchCV is an hyperparameter tuner which uses k-folds method. The principel is you give to gridsearch a dictionary with all the hyper parameters you want to test then it will tests all the hyperparameters (dictionary) and select the best set of hyperparameters (those with the best model cv performance). It can take a very loooooooong time.

You pass a model (estimator) in gridsearch, a set of params and if you want the number of k-folds.

Example:

GridSearchCV(SVC(), parameters, cv = 5)

where SVC() is the estimator, parameters is your dictionary of hyperparameters and cv is the number of folds.