y= pd.Series([0,1,0,1,1,0])
In the code below they have used this and i am stuck on this point. what does y.values==0,0 means and how all the other combination are different from one another.
plt.figure(dpi=120)
plt.scatter(pca[y.values==0,0], pca[y.values==0,1], alpha=0.5, label='Edible', s=2)
plt.scatter(pca[y.values==1,0], pca[y.values==1,1], alpha=0.5, label='Poisonous', s=2)
plt.legend()
Suppose the following numpy array pca
and Series y
:
import pandas as pd
import numpy as np
pca = np.arange(0, 12).reshape(-1, 2)
y = pd.Series([0, 1, 0, 1, 1, 0])
# pca
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
# y
0 0
1 1
2 0
3 1
4 1
5 0
dtype: int64
To get elements from a 2D array, you have to pass the coordinates of rows and columns you want to get:
# Get rows from pca where y==0 and get the first column (0)
>>> pca[y.values==0, 0] # or pca[y==0, 0]
array([ 0, 4, 10])
# Get rows from pca where y==0 and get the second column (1)
>>> pca[y.values==0, 1] # or pca[y==0, 1]
array([ 1, 5, 11])
# This is the same for other scatter line.
Instead of pass selected rows explicitly, here you are using a boolean mask y==0
. It means you return another Series with the same length of y
with boolean values:
>>> y == 0 # Original
0 True # 0
1 False # 1
2 True # 0
3 False # 1
4 False # 1
5 True # 0
dtype: bool