Using dataprep API and I am getting a recursion error when I use the dataprep functions in Google Colab. Oddly it works fine on 144 features of uncleaned data. But once reduced to 20 features and clean the missing values, I get a recursion error
Code:
df.isna().sum()
Output:
rade 0
sub_grade 0
emp_length 0
home_ownership 0
annual_inc 0
verification_status 0
loan_status 0
purpose 0
dti 0
delinq_2yrs 0
inq_last_6mths 0
mths_since_last_delinq 0
open_acc 0
pub_rec 0
revol_bal 0
revol_util 0
total_acc 0
recoveries 0
pub_rec_bankruptcies 0
tax_liens 0
dtype: int64
sys.setrecursionlimit(15000)
from dataprep.eda import create_report, plot, plot_correlation
create_report(df)
Error:
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-55-463fb2fdfb17> in <module>
----> 1 create_report(df)
33 frames
... last 10 frames repeated, from the frame below ...
/usr/local/lib/python3.8/dist-packages/pandas/core/series.py in __repr__(self)
1463 show_dimensions = get_option("display.show_dimensions")
1464
-> 1465 self.to_string(
1466 buf=buf,
1467 name=self.name,
RecursionError: maximum recursion depth exceeded
Following the advice of the first answer, I was able to go through one series at a time and it looks like this code is causing the issue. How can this be written better?
# these columns will take the median value for fillna
median_fill = ['emp_length','annual_inc','open_acc','pub_rec','open_acc','revol_util','total_acc']
for med in median_fill:
df[med].fillna(df[med].median,inplace=True)
You omitted important details from the stack trace.
But if I had to guess, here's what's happening.
Something in create_report wound up calling repr(foo)
,
where foo is a complex custom object.
In the course of computing self.to_string( ... )
we wound up accidentally calling either to_string or repr(foo)
again.
Essentially a while True:
loop.
So .setrecursionlimit() won't help.
You want to understand what foo
is all about,
in order to properly diagnose the root cause
and then fix this.
Start with a simpler report, and build up to the point where you trigger the error.
EDIT
You wrote
df[med].fillna(df[med].median, inplace=True)
Don't do that. Rather than inplace, prefer this:
df[med] = df[med].fillna(df[med].median)