Search code examples
pythonpandasnumpyboolean-indexing

Boolean indexing, trying to search by label with two conditions but boolean and, bitwise &, and numpy logical_and all return errors


I am trying to return the rows of a dataframe in pandas that correspond to the label I choose. For example, in my function Female, it returns all the rows in which the patient is female. For AgeRange, I have run into issues satisfying both conditions without getting an error.

dataset = pd.read_csv('insurance.csv')

def Female(self):
    rows = dataset[dataset.sex == 1]
    print(rows)

def AgeRange(self):
    rows = dataset[dataset.age > 0] & dataset[dataset.age < 20]
    print(rows)

Using the bitwise operator gets be the error below: TypeError: unsupported operand type(s) for &: 'float' and 'bool'

def AgeRange(self):
    rows = dataset[dataset.age > 0] and dataset[dataset.age < 20]
    print(rows)

Using the boolean and operator gets me the error below: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

def AgeRange(self):
    rows = np.logical_and(dataset[dataset.age > 0],dataset[dataset.age < 20])
    print(rows)

Using the numpy logical and gets me the error: ValueError: operands could not be broadcast together with shapes (1309,7) (135,7).

I'm honestly not sure what that leaves me with, or what is causing the issue in the first place. Can anyone help point out where I'm going wrong?


Solution

  • Standard syntax is

    df[(df['a'] > X) & (df['a'] < Y)]
    

    or using query():

    df.query('X < a < Y')