I am studying from a machine learning book, and in a part of a code this appears:
X_train_01_subset= X_train [(y_train ==0) | (y_train ==1)]
y_train_01_subset= y_train [(y_train ==0) | (y_train ==1)]
X_train
is a 104x2 vector with training samples, and y_train
is a 104x1 vector and contains sample labels: 0,1 and 2.
What does
[(y_train ==0) | (y_train ==1)]
in X_train
and y_train
do? (The algorithm is a bit extensive, if you need to put all the code, let me know)
Breaking it in steps. First, this
(y_train == 0)
(y_train == 1)
is an operation to generate boolean mask.
Then, this:
(y_train ==0) | (y_train ==1)
is a bitwise OR operation. That is, it outputs 1
if either or both of the values is 1
else 0
.
Here is an example:
# inputs
In [22]: a = np.array([1, 1, 0, 0])
In [23]: b = np.array([1, 0, 1, 0])
# bitwise or
In [24]: a | b
Out[24]: array([1, 1, 1, 0])
And, finally we use the above result as index to retrieve samples from X_train
:
X_train [(y_train ==0) | (y_train ==1)]