Search code examples
scikit-learngradient-descent

SGDClassifier save loss from every iteration to array


When I train a SGDClassifier in scikit-learn, I can print out the loss value from every iteration (setting verbosity). How to store the values into an array?


Solution

  • Modifying the answer from this post.

    import numpy as np
    from io import StringIO
    import matplotlib.pyplot as plt
    from sklearn.linear_model import SGDClassifier
    from tensorflow.keras.datasets import mnist
    
    (x_tr, y_tr), (x_te, y_te) = mnist.load_dataset()
    x_tr, x_te = x_tr.reshape(-1, 784), x_te.reshape(-1, 784)
    

    Intercept the printed output by the SGDClassifier

    old_stdout = sys.stdout
    sys.stdout = mystdout = StringIO()
    

    Set the model to print its output by setting verbose to 1.

    clf = SGDClassifier(verbose=1)
    clf.fit(x_tr, y_tr)
    

    Get the output of SGDClassifier verbosity

    sys.stdout = old_stdout
    loss_history = mystdout.getvalue()
    

    Create a list to store the loss values

    loss_list = []
    

    Append the loss values printed which is stored in loss_history

    for line in loss_history.split('\n'):
        if(len(line.split("loss: ")) == 1):
            continue
        loss_list.append(float(line.split("loss: ")[-1]))
    

    Just to show the graph

    plt.figure()
    plt.plot(np.arange(len(loss_list)), loss_list)
    plt.xlabel("Time in epochs"); plt.ylabel("Loss")
    plt.show()
    

    To save the loss values to an array,

    loss_list = np.array(loss_list)