Search code examples
pythonnumpyvectorscikit-learnbitwise-operators

Vector labels in Python


I am studying from a machine learning book, and in a part of a code this appears:

X_train_01_subset= X_train [(y_train ==0) | (y_train ==1)]
y_train_01_subset= y_train [(y_train ==0) | (y_train ==1)]

X_train is a 104x2 vector with training samples, and y_train is a 104x1 vector and contains sample labels: 0,1 and 2.

What does

[(y_train ==0) | (y_train ==1)]

in X_train and y_train do? (The algorithm is a bit extensive, if you need to put all the code, let me know)


Solution

  • Breaking it in steps. First, this

    (y_train == 0)
    (y_train == 1)
    

    is an operation to generate boolean mask.

    Then, this:

    (y_train ==0) | (y_train ==1)
    

    is a bitwise OR operation. That is, it outputs 1 if either or both of the values is 1 else 0.

    Here is an example:

    # inputs
    In [22]: a = np.array([1, 1, 0, 0]) 
    In [23]: b = np.array([1, 0, 1, 0]) 
    
    # bitwise or
    In [24]: a | b 
    Out[24]: array([1, 1, 1, 0])
    

    And, finally we use the above result as index to retrieve samples from X_train:

    X_train [(y_train ==0) | (y_train ==1)]