Search code examples
pythonnumpylogistic-regressionnumpy-ndarray

I want to split data (positive/negative) and put them in empty numpy arrays. (LOGISTIC REGRESSION EXAMPLE)


So I am stuck in sorting out this problem and I have this data of email ID and there respective value as 0 & 1 (corresponding tag values used in Logistic Regression). The data is as follows:

input_x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
input_y = np.array([0,1,0,0,1,1,1,0,0,0,0,1,0,1,0])

Now I want to split this data into two sets where I have all 0's values and corresponding "input_x" values in one set and all 1's values and corresponding "input_x" values in other set. For that I have made this function:

def split_data(x,y):
    shpx = x.shape[0]
    shpy = y.shape[0]
    neg_data = 0
    pos_data = 0
    for i in range(shpy):
        if y[i] == 0:
            neg_data = neg_data + 1
        else:
            pos_data = pos_data + 1
        
    print(f"Number of negative (0) values = {neg_data}")
    print(f"Number of positive (1) values = {pos_data}")

    emp_neg_data_x = np.zeros(neg_data)
    emp_neg_data_y = np.zeros(neg_data)
    emp_pos_data_x = np.zeros(pos_data)
    emp_pos_data_y = np.zeros(pos_data)

    for j in range(neg_data):
        for k in range(shpx):
            if y[k] == 0:
                emp_neg_data_x[j] = x[j]
                emp_neg_data_y[j] = 0
            else:
                pass
    for m in range(pos_data):
        for n in range(shpx):
            if y[n] == 0:
                emp_pos_data_x[m] = x[m]
                emp_pos_data_y[m] = 1
            else:
                pass

    return emp_neg_data_x,emp_neg_data_y,emp_pos_data_x,emp_pos_data_y

Where args x & y are input arrays. Upon running this function I get this result as:

Number of negative (0) values = 9
Number of positive (1) values = 6
[1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 2. 3. 4. 5. 6.]
[1. 1. 1. 1. 1. 1.]

The emp_neg_data_y and emp_pos_data_y give correct values but the other two arrays simply output the sequenced index value and not the value of email_idx/input_x corresponding to 0 and 1. Can you help me out? (I guess there is a problem in loop but I am stuck...)


Solution

  • First make a dictionary of x and y:

    input_x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
    input_y = np.array([0,1,0,0,1,1,1,0,0,0,0,1,0,1,0])
    y_dict = {x: input_y[x-1] for x in input_x}
    

    Create your lists and print:

    emp_neg_data_x = [x for x, y in y_dict.items() if y == 0]
    emp_neg_data_y = [y for x, y in y_dict.items() if y == 0]
    emp_pos_data_x = [x for x, y in y_dict.items() if y == 1]
    emp_pos_data_y = [y for x, y in y_dict.items() if y == 1]
    
    print(emp_neg_data_x)
    print(emp_neg_data_y)
    print(emp_pos_data_x)
    print(emp_pos_data_y)
    

    Output:

    [1, 3, 4, 8, 9, 10, 11, 13, 15]
    [0, 0, 0, 0, 0, 0, 0, 0, 0]
    [2, 5, 6, 7, 12, 14]
    [1, 1, 1, 1, 1, 1]