Search code examples
python-3.xmachine-learningneural-networklogistic-regression

How to get the correct answer for logisitc regression?


I'm not getting desired output on binary classification problem.

The problem is using a binary classification to label breast cancer as: - benign, or - malignant

It is not giving the desired output.

First there is a function to load the dataset which return test and train data of shape:

x_train is of shape: (30, 381),
y_train is of shape: (1, 381),
x_test is of shape:  (30, 188),
y_test is of shape:  (1, 188).

Then there is a class for logistic regression classifier, which predicts the output.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

def load_dataset():
    cancer_data = load_breast_cancer()
    x_train, x_test, y_train, y_test = train_test_split(cancer_data.data, cancer_data.target, test_size=0.33)
    x_train = x_train.T
    x_test = x_test.T
    y_train = y_train.reshape(1, (len(y_train)))
    y_test = y_test.reshape(1, (len(y_test)))
    m = x_train.shape[1]
    return x_train, x_test, y_train, y_test, m

class Neural_Network():
    def __init__(self):
        np.random.seed(1)
        self.weights = np.random.rand(30, 1) * 0.01
        self.bias = np.zeros(shape=(1, 1))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def train(self, x_train, y_train, iterations, m, learning_rate=0.5):

        for i in range(iterations):
            z = np.dot(self.weights.T, x_train) + self.bias
            a = self.sigmoid(z)

            cost = (-1 / m) * np.sum(y_train * np.log(a) + (1 - y_train) * np.log(1 - a))

            if (i % 500 == 0):
                print("Cost after iteration %i: %f" % (i, cost))

            dw = (1 / m) * np.dot(x_train, (a - y_train).T)
            db = (1 / m) * np.sum(a - y_train)

            self.weights = self.weights - learning_rate * dw
            self.bias = self.bias - learning_rate * db

    def predict(self, inputs):
        m = inputs.shape[1]
        y_predicted = np.zeros((1, m))
        z = np.dot(self.weights.T, inputs) + self.bias
        a = self.sigmoid(z)
        for i in range(a.shape[1]):
            y_predicted[0, i] = 1 if a[0, i] > 0.5 else 0
        return y_predicted

if __name__ == "__main__":
    '''
    step-1 : Loading data set
                 x_train is of shape: (30, 381)
                 y_train is of shape: (1, 381)
                 x_test is of shape:  (30, 188)
                 y_test is of shape:  (1, 188)
    '''

    x_train, x_test, y_train, y_test, m = load_dataset()

    neuralNet = Neural_Network()

    '''
       step-2 : Train the network
    '''

    neuralNet.train(x_train, y_train,10000,m)


    y_predicted = neuralNet.predict(x_test)

    print("Accuracy on test data: ")
    print(accuracy_score(y_test, y_predicted)*100)

The program giving this output:

    C:\Python36\python.exe C:/Users/LENOVO/PycharmProjects/MarkDmo001/Numpy.py
Cost after iteration 0: 5.263853
C:/Users/LENOVO/PycharmProjects/MarkDmo001/logisticReg.py:25: RuntimeWarning: overflow encountered in exp
  return 1 / (1 + np.exp(-x))
C:/Users/LENOVO/PycharmProjects/MarkDmo001/logisticReg.py:33: RuntimeWarning: divide by zero encountered in log
  cost = (-1 / m) * np.sum(y_train * np.log(a) + (1 - y_train) * np.log(1 - a))
C:/Users/LENOVO/PycharmProjects/MarkDmo001/logisticReg.py:33: RuntimeWarning: invalid value encountered in multiply
  cost = (-1 / m) * np.sum(y_train * np.log(a) + (1 - y_train) * np.log(1 - a))
Cost after iteration 500: nan
Cost after iteration 1000: nan
Cost after iteration 1500: nan
Cost after iteration 2000: nan
Cost after iteration 2500: nan
Cost after iteration 3000: nan
Cost after iteration 3500: nan
Cost after iteration 4000: nan
Cost after iteration 4500: nan
Cost after iteration 5000: nan
Cost after iteration 5500: nan
Cost after iteration 6000: nan
Cost after iteration 6500: nan
Cost after iteration 7000: nan
Cost after iteration 7500: nan
Cost after iteration 8000: nan
Cost after iteration 8500: nan
Cost after iteration 9000: nan
Cost after iteration 9500: nan

Accuracy: 
0.0

Solution

  • The problem is exploding gradients. You need to normalize your input to [0, 1].

    If you look at feature 3 and feature 23 in your training data, you will see values larger than 3000. After these get multiplied with your initial weights, they still lie in range [0, 30]. Thus, in the first iteration, the z vector contains only positive numbers with values up to around 50. As a result, the a vector (the output of your sigmoid) looks like this:

    [0.9994797 0.99853904 0.99358676 0.99999973 0.98392862 0.99983016 0.99818802 ...]
    

    So in the first step, your model always predicts 1 with a high confidence. But that is not always right and the high probabilities that your model output lead to a large gradient, which you can see when you look at the highest values of dw. In my case,

    • dw[3] was 388
    • dw[23] was 571

    and the other values lay in [0, 55]. So, you can clearly see how the large inputs in these features lead to an exploding gradient. Because gradient descent now takes a way too large step into the opposite direction, the weights in the next step are not in [0, 0.01], but in [-285, 0.002], which only makes things worse. In the next iteration, z contains values around - 1 million, which leads to the overflow in the sigmoid function.

    Solution

    1. Normalize your inputs to [0, 1]
    2. Use weights in [-0.01, 0.01], so that they roughly cancel each other out. Otherwise, your values in z still scale linearly with the number of features you have.

    As for normalizing the inputs, you can use sklearn's MinMaxScaler:

    x_train, x_test, y_train, y_test, m = load_dataset()
    
    scaler = MinMaxScaler()
    x_train_normalized = scaler.fit_transform(x_train.T).T
    
    neuralNet = Neural_Network()
    
    '''
       step-2 : Train the network
    '''
    
    neuralNet.train(x_train_normalized, y_train,10000,m)
    
    # Use the same transformation on the test inputs as on the training inputs
    x_test_normalized = scaler.transform(x_test.T).T
    y_predicted = neuralNet.predict(x_test_normalized)
    

    The .Ts are because sklearn expects training inputs to have the shape (num_samples, num_features), while your x_train and x_test have the shape (num_features, num_samples).