Search code examples
pythonmachine-learningkerashyperparameters

Is it okay to have non-trainable params in machine learning?


When building a model for machine learning, is it okay to have non-trainable params? Or does this create errors in the model? I'm confused as to what non-trainable params actually are and how to fix your model based on that.


Solution

  • EDIT: as enumaris has mentioned in comments, the question is probably referring to non-trainable parameters in Keras rather than non-trainable parameters in general (hyperparameters)

    Non-trainable parameters in Keras are described in answer to this question.

    ...non-trainable parameters of a model are those that you will not be updating and optimized during training, and that have to be defined a priori, or passed as inputs.

    The example of such parameters are:

    1. the number of hidden layers
    2. nodes on each hidden layer
    3. the nodes on each individual layer
      and others

    These parameters are "non-trainable" because you can't optimize its value with your training data.

    To address your questions:

    Is it okay to have non-trainable params?

    Yes, it is okay and in fact is inevitable if your are building an NN or some another machine learning model.

    Does this create errors in the model?

    It does not create an error by default, it determines the architecture of your neural network.

    But some architectures will perform better for your data and task than others.

    So if you will choose sub-optimal non-trainable parameters, you can and will underfit on your data

    Optimizing non-trainable parameters is an whole another, quite broad topic.


    Answer for a general machine learning theory:

    Non-training parameters (not specifically for Keras) are called hyperparameters.

    Their puprose is to adapt an algorithm to the specific requirements of yours.

    For example if you are training a simple Logistic Regression, you have a parameter C, which stands for regularization, which basically influencing on how much you will "penalty" an algorithm for wrong answers.

    You might want to penalty your algorithm very hard to generalize more (but you can underfit as well), or you might want to not penalty high for the mistakes (and this also can lead to overfitting)

    This is the thing that you can't learn from the data - this is the thing that you can adjust to fit your particular need.