I am trying to filter my dataframe on multiple conditions using loc
The data is coming from an Excel file that I imported into the dataframe using df = pd.read_csv()
Here is sample structure from the df using df.head() :
and I need to filter on the following criteria: State-Gov, Bachelors, Never-Married, Adm-Clerical, Not-in-family, White, Male, United States, <=50k
Here is the code that I wrote. It returns the correct columns, but no rows the df is empty :
df.loc[(df['Sector']=='State-Gov') & (df['Education']=='Bachelors') & (df['Married']=='Never-Married') &
(df['Job']=='Adm-Clerical') & (df['Family']=='Not-in-Family') & (df['Race']=='White') & (df['Gender']=='Male') &
(df['Location']=='United-States') & (df['Income']=='<=50k'), ['Sector', 'Education', 'Married', 'Job', 'Family', 'Race', 'Gender', 'Location', 'Income']]
Any help is appreciated, thank you!
The problem seems to be in "not-cleaned" data (whitespaces before and after string values).
Solution is to clean the data (using .str.strip()
etc.), for example:
print(df.loc[df["Sector"].str.strip() == "State-gov"])