I have extracted 1000 rules from decision tree and saved in a dataframe. Below are the sample rule
(age > 25) & (Tenure < 48)
Now I want to check how many observations in a pandas dataframe (Data_rules) are following each rule. Basically I want to check length of dataframe after applying above rule. Below are the code I have written
for i in Data_rules.index:
temp = len(train[Data_rules['Rules'][i]])
output.append(temp)
This code is throwing me a 'key error' because Data_rules['Rules'][i] will give each rule in a form of string and starts with single quotes like '(age > 25) & (Tenure < 48)' but we need to pass this rule without quotes to train dataset. Can anyone help me on this please.
This is exactly what DataFrame.query
is for, here's an example:
import pandas as pd
df = pd.DataFrame({"age": [10, 15, 20, 25, 30, 35], "Tenure": [1, 1, 1, 1, 50, 47]})
result = df.query("(age > 25) & (Tenure < 48)")
print(result)
Output:
age Tenure
5 35 47