Search code examples
pandascutbinning

Binning continuous variable using pandas.cut


I am trying to bin a continuous variable (net_revenue -> range -2000 to 455) using pd.cut. However I have been getting a


SettingWithCopy error


(A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy)

Here is my code,

oct20_df_clean = pd.DataFrame()
oct20_df_clean = oct20_df[(oct20_df['tenure'] >=0) & 
                          (oct20_df['tenure'].notnull()]                               ]

oct20_df_clean['bin_net_revenue'] = pd.cut(x=oct20_df_clean.loc[:,'net_revenue'], 
                                                             bins = [-2000, -157.56, -44.81, 0.0, 28.58, 85.0, 114.25, 148.17, 148.58,
                                                                    148.67, 148.83, 456], 
                                                             labels = ['1%', '5%', '10%', '25%', '50%', '75%', '90%', '95%','97%',
                                                                       '99%', '100%'], 
                                                            precision =2
                                                           )

Thanks!


Solution

  • The problem is not pd.cut, rather, this:

    oct20_df_clean = oct20_df[(oct20_df['tenure'] >=0) & 
                              (oct20_df['tenure'].notnull()] 
    

    which says that oct20_df_clean is a slice of oct20_df which can't be modified. Chain that with a copy and you would be good

    oct20_df_clean = oct20_df[(oct20_df['tenure'] >=0) & 
                              (oct20_df['tenure'].notnull()
                             ].copy()