I am storing large xarrays in my dataframe, but any time I display the dataframe in Jupyter or the terminal it takes way to long (11 seconds for a 10 row dataframe). I'd imagine it has something to do with how pandas is grabbing whatever repr information from the individual cells, and it's shoving the whole xarray in there, but then truncates the display after the fact? Who knows?
Is there some pandas setting that will limit this behavior?
Here's the code:
import pandas as pd
import numpy as np
import xarray as xr
df = pd.DataFrame({'xarrays':[xr.DataArray(np.random.randn(50,50))
for _ in range(10)], # 10 50x50 xarrays
'other_stuff':np.arange(10)})
The attached image show the time for displaying the whole frame, the xarray series, and a normal series, but the quick breakdown:
Display Type | Time |
---|---|
Whole df | 11 s |
Xarray series | 6 s |
normal series | 0 s |
directly displaying df repr_html | 0.2 s |
Expected to display abbreviated/truncated xarray rows without much fuss. Takes way too long just to display.
Solved it! Apparently the option I was looking for was pandas' display.pprint_nest_depth. After limiting that to 1, things sped up considerably, but I'm not yet sure the implication of that.
# ------- Same dataframe as before ----------------
import pandas as pd
import numpy as np
import xarray as xr
df = pd.DataFrame({'xarrays':[xr.DataArray(np.random.randn(50,50))
for _ in range(10)], # 10 50x50 xarrays
'other_stuff':np.arange(10)})
# ------- Experimenting with pprint settings ----------------
import IPython.display
# NOTE apparently my computer has sped up a bit, so the default display speed has sped up from 11 seconds to 5 seconds
# but that is still way to slow
pd.set_option('display.pprint_nest_depth',3) # (default)
IPython.display.display(df) # 5.4 seconds
IPython.display.display(df.xarrays) # 2.7 seconds (default)
pd.set_option('display.pprint_nest_depth',2)
IPython.display.display(df)# also 5.4 seconds
IPython.display.display(df.xarrays)# also 2.7 seconds
pd.set_option('display.pprint_nest_depth',1)
# SUCCESS!
IPython.display.display(df)# 0.2 seconds
IPython.display.display(df.xarrays)# 0.1 seconds