I have a data frame with more than 1000 rows and 200 columns something like this:
my_data:
ID, f1, f2, .. ,f200 Target
x1 3 0, .. ,2 0
x2 6 2, .. ,1 1
x3 5 4, .. ,0 0
x4 0 5, .. ,18 1
.. . ., .. ,.. .
xn 13 0, .. ,4 0
First, I want to automatically discretize these features (f1-f200) into four groups as no
, low
, medium
and high
, so that the Ids which have zero in their columns (e.g., x1 in f2 contains 0, the same in xn .. ) should be labels "no", the rest should be categorized into low, medium and high.
I found this:
pd.cut(my_data,3, labels=["low", "medium", "high"])
But, this does not solve the problem. Any idea?
So, you need to create dynamic bins and iterate columns to get this. This can be done by below:
new_df = pd.DataFrame()
for name,value in df1.iteritems(): ##df1 is your dataframe
bins = [-np.inf, 0,df1[name].min()+1,df1[name].mean(), df1[name].max()]
new_df[name] = pd.cut(df1[name], bins=bins, include_lowest=False, labels=['no','low', 'mid', 'high'])