Search code examples
pythonpandasdata-scienceseries

if y is a pandas series object with 0 and 1, then what does y.values==0,1 or y.values==0,0 means?


y= pd.Series([0,1,0,1,1,0])

In the code below they have used this and i am stuck on this point. what does y.values==0,0 means and how all the other combination are different from one another.

plt.figure(dpi=120)
plt.scatter(pca[y.values==0,0], pca[y.values==0,1], alpha=0.5, label='Edible', s=2)
plt.scatter(pca[y.values==1,0], pca[y.values==1,1], alpha=0.5, label='Poisonous', s=2)
plt.legend()

Solution

  • Suppose the following numpy array pca and Series y:

    import pandas as pd
    import numpy as np
    
    pca = np.arange(0, 12).reshape(-1, 2)
    y = pd.Series([0, 1, 0, 1, 1, 0])
    
    # pca
    array([[ 0,  1],
           [ 2,  3],
           [ 4,  5],
           [ 6,  7],
           [ 8,  9],
           [10, 11]])
    
    # y
    0    0
    1    1
    2    0
    3    1
    4    1
    5    0
    dtype: int64
    

    To get elements from a 2D array, you have to pass the coordinates of rows and columns you want to get:

    # Get rows from pca where y==0 and get the first column (0)
    >>> pca[y.values==0, 0]  # or pca[y==0, 0]
    array([ 0,  4, 10])
    
    # Get rows from pca where y==0 and get the second column (1)
    >>> pca[y.values==0, 1]  # or pca[y==0, 1]
    array([ 1,  5, 11])
    
    # This is the same for other scatter line.
    

    Instead of pass selected rows explicitly, here you are using a boolean mask y==0. It means you return another Series with the same length of y with boolean values:

    >>> y == 0   # Original
    0     True   # 0
    1    False   # 1
    2     True   # 0
    3    False   # 1
    4    False   # 1
    5     True   # 0
    dtype: bool